math.factorial may throw OverflowError #64738

ncoghlan · 2014-02-07T10:54:29Z

BPO	20539
Nosy	@mdickinson, @ncoghlan, @pitrou, @vstinner, @skrah, @gareth-rees, @MojoVampire
Files	huge_factorial_input.patch huge_factorial_input_v2.patch huge_factorial_input_v3.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/mdickinson'
closed_at = <Date 2014-04-10.13:35:30.993>
created_at = <Date 2014-02-07.10:54:29.109>
labels = ['extension-modules', 'type-bug']
title = 'math.factorial may throw OverflowError'
updated_at = <Date 2014-04-10.13:35:30.991>
user = 'https://github.com/ncoghlan'

bugs.python.org fields:

activity = <Date 2014-04-10.13:35:30.991>
actor = 'mark.dickinson'
assignee = 'mark.dickinson'
closed = True
closed_date = <Date 2014-04-10.13:35:30.993>
closer = 'mark.dickinson'
components = ['Extension Modules']
creation = <Date 2014-02-07.10:54:29.109>
creator = 'ncoghlan'
dependencies = []
files = ['33961', '33965', '34777']
hgrepos = []
issue_num = 20539
keywords = ['patch']
message_count = 29.0
messages = ['210448', '210449', '210450', '210451', '210453', '210454', '210455', '210477', '210480', '210496', '210497', '210510', '210511', '210512', '210513', '210528', '210555', '210571', '215842', '215847', '215852', '215857', '215861', '215862', '215869', '215873', '215877', '215879', '215881']
nosy_count = 9.0
nosy_names = ['mark.dickinson', 'ncoghlan', 'pitrou', 'vstinner', 'skrah', 'python-dev', 'gdr@garethrees.org', 'garybernhardt', 'josh.r']
pr_nums = []
priority = 'low'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue20539'
versions = ['Python 3.5']

ncoghlan · 2014-02-07T10:54:29Z

I believe this is mostly a curiousity (since actually calculating a factorial this big would take an interminable amount of time), but math.factorial can be provoked into throwing OverflowError by a large enough input:

>>> math.factorial(10**19)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

>>> math.factorial(1e19)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

mdickinson · 2014-02-07T11:00:11Z

What behaviour would you suggest instead?

Apart from the time, the result would have over 10**20 digits, so would need exabytes of memory to represent.

>>> from math import lgamma, log
>>> lgamma(1e19) / log(10)
1.8565705518096748e+20

vstinner · 2014-02-07T11:03:45Z

What behaviour would you suggest instead?

Some others Python functions like str * int raises a MemoryError on overflow.

But I don't know if OverflowError or MemoryError is better.

mdickinson · 2014-02-07T11:13:46Z

Some others Python functions like str * int raises a MemoryError on overflow.

I have to say that that's always seemed wrong to me: IMO MemoryError should only be raised when we've really run out of memory, or perhaps where the amount of memory we're about to need exceeds all possible limits (e.g., more than 2**63 bytes on a 64-bit system).

skrah · 2014-02-07T11:40:27Z

OverflowError seems like a good choice if only values in the range
of a C long are accepted. ValueError would perhaps be more intuitive,
since the user normally only cares about the accepted range of
input values rather than the internal details.

ncoghlan · 2014-02-07T11:52:02Z

I figured it was some internal storage overflowing, and I don't have an issue with the error type. The bit that is confusing is the error *message* - what int doesn't fit into a C long? (especially in the second case of float input where no user level int is involved at all).

It's mostly a theoretical problem though - as far as I am aware, the user that hit it was deliberately testing how far the limits of the "infinite precision" integers promised in the documentation actually extend.

mdickinson · 2014-02-07T11:52:44Z

The bit that is confusing is the error *message* - what int doesn't fit into a C long?

Agreed: it would be nice to intercept this and give a better message.

pitrou · 2014-02-07T14:41:29Z

I also think OverflowError is better than MemoryError here.

gareth-rees · 2014-02-07T15:24:31Z

It's not a case of internal storage overflowing. The error is from
Modules/mathmodule.c:1426 and it's the input 10**19 that's too large
to convert to a C long. You get the same kind of error in other
places where PyLong_AsLong or PyLong_AsInt is called on a
user-supplied value, for example:

    >>> import pickle
    >>> pickle.dumps(10**19, 10**19)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: Python int too large to convert to C long

mdickinson · 2014-02-07T17:08:27Z

Thinking about it, ValueError seems like the right exception type: nothing's actually overflowing, because we haven't even tried to do any computation. This is just a limitation of what math.factorial is prepared to accept as input (and perhaps that limitation should be documented).

Here's a patch. Bikeshedding about the exception type and the exception message is welcome.

vstinner · 2014-02-07T17:19:39Z

OverflowError makes sense because math.factorial(10**19) will overflow in CPython on common platforms, even if it didn't overflowed yet.

On a supercomputer with a different Python implementation, you may be able to compute it.

IMO An OverflowError is specific to a platform and Python implementation, whereas ValueError is "portable": any Python implementation must raise such error.

I can imagine that a Python implementation may return a pseudo-int type which is supposed to be the result of math.factorial(), so you can compute for example math.factorial(10**19) % 2 (hint: result is 0).

mdickinson · 2014-02-07T18:16:41Z

Okay; OverflowError seems to have the majority vote. I'll change it to that.

Any strong opinions on the message?

pitrou · 2014-02-07T18:21:45Z

"Outrageously" sounds like the argument is immoral, but do we have an objective test for that? :-)

(nitpicking a bit: negative values should probably raise a proper ValueError, no?)

mdickinson · 2014-02-07T18:23:30Z

nitpicking a bit: negative values should probably raise a proper ValueError, no?

I think they do, with this patch. Or maybe I'm misunderstanding?

With the current form of the patch:

>>> math.factorial(10**20)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: factorial() argument outrageously large
>>> math.factorial(-10**20)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: factorial() not defined for negative values

pitrou · 2014-02-07T18:25:06Z

Ah, well, my bad. I just wanted to have a good argument :-)

mdickinson · 2014-02-07T19:10:09Z

Updated patch.

skrah · 2014-02-07T20:55:09Z

I slightly favor the ValueError patch because there is only a single exception
to catch. PyLong_AsUnsignedLong() also raises OverflowError for both positive
values that are too large and for negative values.

Perhaps the error message could contain the actual range of allowed values,
especially since LONG_MAX differs across platforms.

The drawback is that "outrageously" would have to go then. :)

garybernhardt · 2014-02-07T22:28:16Z

Although I thoroughly enjoyed "outrageously", I agree with Stefan about including the range of values. As a user who doesn't know the implementation, "outrageously" will just leave me asking why, but indicating the range will tell me exactly why the exception was raised.

mdickinson · 2014-04-09T20:00:00Z

Updated patch.

MojoVampire · 2014-04-09T21:26:14Z

Mark, you said:

OverflowError seems to have the majority vote. I'll change it to that.

But your most recent patch seems to have gone back to ValueError for both cases. Is there a reason for that? FWIW, I favor OverflowError as the "too large" type, though I prefer the less hyperbolic phrasing of the error message in the new patch.

mdickinson · 2014-04-09T22:11:07Z

Josh: well spotted; thanks for picking up on this. I was wondering whether anyone would pick up on that, and I should have mentioned the change explicitly.

I do think ValueError is the correct exception here. My interpretation of OverflowError is that it represents an overflow occurring during the computation, and that's not what's happening here: we're flat-out rejecting an input value because it's out of the range of values we accept. Stefan's message influenced me a bit in that direction, too.

MojoVampire · 2014-04-09T23:56:18Z

I just think it's a little odd to make math.factorial uniquely sensitive to the documented definition of OverflowError. Basically every one of the non-masking PyLong_As<CType> interfaces uses OverflowError for too large values.

In fact, all the functions which are converting to unsigned C types and aren't doing so for "program logic reasons" (e.g. the shift functions don't accept a negative shift due to semantic obscurity, and long_to_bytes doesn't accept a negative length for the output) use OverflowError when a negative value is passed as well.

The PyArg_ParseTuple* functions raise OverflowError as well, either implicitly (when they call a PyLong_As<CType> function), or explicitly for those functions that impose more restrictive ranges than the PyLong function they wrap.

Long story short: The majority of C extension functions that convert to C level types are raising OverflowError to mean "out of range for implementation's chosen C type" (whether too large, too small, or negative when only unsigned values excepted) as a side-effect of using Python's documented APIs. Making math.factorial the exception would be violating the reasonable expectation one might get from using similar functions.

mdickinson · 2014-04-10T01:06:35Z

Making math.factorial the exception would be violating the reasonable expectation one might get from using similar functions.

Can you point to some examples of those similar functions? I can't think of any obvious examples offhand.

MojoVampire · 2014-04-10T03:38:47Z

A few examples (some are patently ridiculous, since the range of values anyone would use ends long before you'd overflow a 32 bit integer, let alone a 64 bit value on my build of Python, but bear with me:

>>> datetime.datetime(2**64, 1, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long
>>> datetime.datetime(-2**64, 1, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

>>> time.mktime(time.struct_time([2**64]*9))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

>>> sqlite3.enable_callback_tracebacks(2**64)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

(That last one should really be changed to type code 'p' over 'i', or to 'B' since it's just a boolean, so overflow doesn't matter, just truthy/falsy behavior)

It also happens if you pass re functions/methods a too large flags value:
>>> re.sub(r'(abc)', r'\1', 'abcd', re.IGNORECASE << 64)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/shadowranger/src/cpython/Lib/re.py", line 175, in sub
    return _compile(pattern, flags).sub(repl, string, count)
OverflowError: Python int too large to convert to C ssize_t

Skipping the tracebacks, a few more examples of functions where at least one argument can raise OverflowError for implementation specific reasons, rather than a logical overflow of some kind (which is what math.factorial's is):

os.getpriority, os.setpriority, os.waitid, os.tcsetpgrp, a central utility function (get_data) in zipimport (I believe the value it's parsing is derived from a zip header, so it may not be possible to feed it too-large values; haven't checked), quite a number of socket functions (often for stuff that should really be ValueErrors, e.g. port numbers out of range), and more random things. I found all of these with a simple:

find cpython/ -type f -name '*.c' -exec grep -nP 'PyArg_Parse.*?"\w*?[bhilL]"' {} + > exampleoverflow.txt

There aren't any other good examples in math, largely because the other functions there deal with floats (or have arbitrary precision integer fallback paths, in the case of the log suite of functions).

That find only scratches the surface; many PyArg_Parse* calls are split across lines (so my simple regex won't catch them), and old Python code has a habit of not using PyArg_Parse* even when it makes sense (presumably because they wanted to customize error messages, or didn't like the way the provided formatting codes handled edge cases).

In reality, any place PyLong_As* is called (when * is not one of the masking functions) on an argument that came from the user without explicitly checking for an replacing OverflowError will potentially trigger this issue. A cursory search of locations where this function is called reveals OverflowErrors in the r parameter to to itertools.permutations, and that decimal is riddled with cases where they return if PyLong_As* has an error (including OverflowError) without changing the exception type, then a second round of range checking will set ValueError if it didn't Overflow. Examples include Context object's prec and clamp properties, but there are a dozen or more functions doing this, and I don't know if all of them are publically accessible.

Fewer of the calls will be publically visible, so there's more to look through, but you can run the same search to find tons of places with potentially similar behavior:

find cpython/ -type f -name '*.c' -exec grep -nP 'Py(Long|Number)As(?!.*(?:Mask|NULL|PyExc(?!Overflow)))' {} + > exampleoverflowdirectcall.txt

I suspect that for every case where Python standard libs behave this way (raising OverflowErrors in ways that disregard the official docs description of when it should be used), there are a dozen where a third party module behaves this way, since third party modules are more likely to use the standardized argument parsing and numeric parsing APIs without rejiggering the default exceptions, assuming that the common APIs raise the "correct" errors.

skrah · 2014-04-10T10:58:40Z

OverflowError during initialization dates back to the early days of Python:

http://hg.python.org/cpython/rev/0437738279a8

Integer arithmetic actually did raise OverflowError back then:

>>> 2222222222222 * 22222222222222
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OverflowError: integer multiplication
>>>

This of course no longer happens, so the whole idea of OverflowError seems
slightly outdated.

mdickinson · 2014-04-10T12:32:05Z

Looks like this harmless-looking issue has opened up a can of worms.

the whole idea of OverflowError seems slightly outdated.

Well, not entirely. It's still got a place for overflow of mathematical operations, and I think it's clearly the correct exception in that case (indeed, it's about the only case I can think of where OverflowError is clearly the correct exception).

>>> from math import exp
>>> exp(5000.0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: math range error

I'd treat conversion to float as a special case of this; that is, the following is another case where OverflowError is (IMO) the right thing to use:

>>> float(10**1000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: long int too large to convert to float

A less clear-cut case:

>>> sqrt(10**500)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: long int too large to convert to float

This one still seems okay to me, even though the sqrt result *is* within the floating-point range. You can read this as two operations: a conversion from the integer input to a float, followed by a floating-point square root operation, and it's the intermediate result that overflows.

And a case that's clearly wrong:

>>> 53 >> 10**100  # Why not simply return 0?
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C ssize_t

Another historical oddity is that OverflowError inherits from ArithmeticError (along with FloatingPointError and ZeroDivisionError), which supports its use for arithmetic and mathematical operations that overflow their target type.

>>> OverflowError.__mro__
(<class 'OverflowError'>, <class 'ArithmeticError'>, <class 'Exception'>, <class 'BaseException'>, <class 'object'>)

In the interests of moving this *particular* issue forward, I'll revert to OverflowError for this patch. I still personally think it's the wrong exception, but:

(1) It's not more wrong than before: we were already raising OverflowError for large positive values.
(2) It achieves the main aim of replacing an obscure error message with a more understandable one (which is what Nick wanted in the first place: msg210454).
(3) Given the lack of clarity about which exception types are appropriate where, I think we shouldn't be changing exception types unless/until there's a clear vision of where we want to end up. Getting that clear vision may require a python-dev discussion.

We can still make the change from OverflowError to ValueError at some later point once it's clear that that's the right thing to do. But converting from OverflowError to ValueError now, and then deciding later that that was wrong and converting back to OverflowError would be a horrible thing to do.

skrah · 2014-04-10T13:08:57Z

Mark Dickinson <report@bugs.python.org> wrote:

> the whole idea of OverflowError seems slightly outdated.

Well, not entirely. It's still got a place for overflow of mathematical operations, and I think it's clearly the correct exception in that case (indeed, it's about the only case I can think of where OverflowError is clearly the correct exception).

Indeed, I was focusing on integer arithmetic and assignments. For float
operations and conversions OverflowError is quite natural.

(3) Given the lack of clarity about which exception types are appropriate where, I think we shouldn't be changing exception types unless/until there's a clear vision of where we want to end up. Getting that clear vision may require a python-dev discussion.

Using OverflowError for now sounds like a good plan.

python-dev · 2014-04-10T13:29:53Z

New changeset 273e17260d25 by Mark Dickinson in branch 'default':
Issue bpo-20539: Improve math.factorial error messages and types for large inputs.
http://hg.python.org/cpython/rev/273e17260d25

mdickinson · 2014-04-10T13:35:31Z

Patch applied to the default branch (but with OverflowError instead of ValueError for large positive inputs). I don't see a lot of benefit in applying the fix to the maintenance branches. Closing this issue.

mdickinson added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Feb 7, 2014

mdickinson self-assigned this Apr 9, 2014

mdickinson closed this as completed Apr 10, 2014

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

math.factorial may throw OverflowError #64738

math.factorial may throw OverflowError #64738

ncoghlan commented Feb 7, 2014

ncoghlan commented Feb 7, 2014

mdickinson commented Feb 7, 2014

vstinner commented Feb 7, 2014

mdickinson commented Feb 7, 2014

skrah mannequin commented Feb 7, 2014

ncoghlan commented Feb 7, 2014

mdickinson commented Feb 7, 2014

pitrou commented Feb 7, 2014

gareth-rees mannequin commented Feb 7, 2014

mdickinson commented Feb 7, 2014

vstinner commented Feb 7, 2014

mdickinson commented Feb 7, 2014

pitrou commented Feb 7, 2014

mdickinson commented Feb 7, 2014

pitrou commented Feb 7, 2014

mdickinson commented Feb 7, 2014

skrah mannequin commented Feb 7, 2014

garybernhardt mannequin commented Feb 7, 2014

mdickinson commented Apr 9, 2014

MojoVampire mannequin commented Apr 9, 2014

mdickinson commented Apr 9, 2014

MojoVampire mannequin commented Apr 9, 2014

mdickinson commented Apr 10, 2014

MojoVampire mannequin commented Apr 10, 2014

skrah mannequin commented Apr 10, 2014

mdickinson commented Apr 10, 2014

skrah mannequin commented Apr 10, 2014

python-dev mannequin commented Apr 10, 2014

mdickinson commented Apr 10, 2014

math.factorial may throw OverflowError #64738

math.factorial may throw OverflowError #64738

Comments

ncoghlan commented Feb 7, 2014

ncoghlan commented Feb 7, 2014

mdickinson commented Feb 7, 2014

vstinner commented Feb 7, 2014

mdickinson commented Feb 7, 2014

skrah mannequin commented Feb 7, 2014

ncoghlan commented Feb 7, 2014

mdickinson commented Feb 7, 2014

pitrou commented Feb 7, 2014

gareth-rees mannequin commented Feb 7, 2014

mdickinson commented Feb 7, 2014

vstinner commented Feb 7, 2014

mdickinson commented Feb 7, 2014

pitrou commented Feb 7, 2014

mdickinson commented Feb 7, 2014

pitrou commented Feb 7, 2014

mdickinson commented Feb 7, 2014

skrah mannequin commented Feb 7, 2014

garybernhardt mannequin commented Feb 7, 2014

mdickinson commented Apr 9, 2014

MojoVampire mannequin commented Apr 9, 2014

mdickinson commented Apr 9, 2014

MojoVampire mannequin commented Apr 9, 2014

mdickinson commented Apr 10, 2014

MojoVampire mannequin commented Apr 10, 2014

skrah mannequin commented Apr 10, 2014

mdickinson commented Apr 10, 2014

skrah mannequin commented Apr 10, 2014

python-dev mannequin commented Apr 10, 2014

mdickinson commented Apr 10, 2014