Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math.factorial may throw OverflowError #64738

Closed
ncoghlan opened this issue Feb 7, 2014 · 29 comments
Closed

math.factorial may throw OverflowError #64738

ncoghlan opened this issue Feb 7, 2014 · 29 comments
Assignees
Labels
extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@ncoghlan
Copy link
Contributor

ncoghlan commented Feb 7, 2014

BPO 20539
Nosy @mdickinson, @ncoghlan, @pitrou, @vstinner, @skrah, @gareth-rees, @MojoVampire
Files
  • huge_factorial_input.patch
  • huge_factorial_input_v2.patch
  • huge_factorial_input_v3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/mdickinson'
    closed_at = <Date 2014-04-10.13:35:30.993>
    created_at = <Date 2014-02-07.10:54:29.109>
    labels = ['extension-modules', 'type-bug']
    title = 'math.factorial may throw OverflowError'
    updated_at = <Date 2014-04-10.13:35:30.991>
    user = 'https://github.com/ncoghlan'

    bugs.python.org fields:

    activity = <Date 2014-04-10.13:35:30.991>
    actor = 'mark.dickinson'
    assignee = 'mark.dickinson'
    closed = True
    closed_date = <Date 2014-04-10.13:35:30.993>
    closer = 'mark.dickinson'
    components = ['Extension Modules']
    creation = <Date 2014-02-07.10:54:29.109>
    creator = 'ncoghlan'
    dependencies = []
    files = ['33961', '33965', '34777']
    hgrepos = []
    issue_num = 20539
    keywords = ['patch']
    message_count = 29.0
    messages = ['210448', '210449', '210450', '210451', '210453', '210454', '210455', '210477', '210480', '210496', '210497', '210510', '210511', '210512', '210513', '210528', '210555', '210571', '215842', '215847', '215852', '215857', '215861', '215862', '215869', '215873', '215877', '215879', '215881']
    nosy_count = 9.0
    nosy_names = ['mark.dickinson', 'ncoghlan', 'pitrou', 'vstinner', 'skrah', 'python-dev', 'gdr@garethrees.org', 'garybernhardt', 'josh.r']
    pr_nums = []
    priority = 'low'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue20539'
    versions = ['Python 3.5']

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Feb 7, 2014

    I believe this is mostly a curiousity (since actually calculating a factorial this big would take an interminable amount of time), but math.factorial can be provoked into throwing OverflowError by a large enough input:

    >>> math.factorial(10**19)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: Python int too large to convert to C long
    
    >>> math.factorial(1e19)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: Python int too large to convert to C long

    @mdickinson
    Copy link
    Member

    What behaviour would you suggest instead?

    Apart from the time, the result would have over 10**20 digits, so would need exabytes of memory to represent.

    >>> from math import lgamma, log
    >>> lgamma(1e19) / log(10)
    1.8565705518096748e+20

    @vstinner
    Copy link
    Member

    vstinner commented Feb 7, 2014

    What behaviour would you suggest instead?

    Some others Python functions like str * int raises a MemoryError on overflow.

    But I don't know if OverflowError or MemoryError is better.

    @mdickinson
    Copy link
    Member

    Some others Python functions like str * int raises a MemoryError on overflow.

    I have to say that that's always seemed wrong to me: IMO MemoryError should only be raised when we've really run out of memory, or perhaps where the amount of memory we're about to need exceeds all possible limits (e.g., more than 2**63 bytes on a 64-bit system).

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Feb 7, 2014

    OverflowError seems like a good choice if only values in the range
    of a C long are accepted. ValueError would perhaps be more intuitive,
    since the user normally only cares about the accepted range of
    input values rather than the internal details.

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Feb 7, 2014

    I figured it was some internal storage overflowing, and I don't have an issue with the error type. The bit that is confusing is the error *message* - what int doesn't fit into a C long? (especially in the second case of float input where no user level int is involved at all).

    It's mostly a theoretical problem though - as far as I am aware, the user that hit it was deliberately testing how far the limits of the "infinite precision" integers promised in the documentation actually extend.

    @mdickinson
    Copy link
    Member

    The bit that is confusing is the error *message* - what int doesn't fit into a C long?

    Agreed: it would be nice to intercept this and give a better message.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 7, 2014

    I also think OverflowError is better than MemoryError here.

    @gareth-rees
    Copy link
    Mannequin

    gareth-rees mannequin commented Feb 7, 2014

    It's not a case of internal storage overflowing. The error is from
    Modules/mathmodule.c:1426 and it's the input 10**19 that's too large
    to convert to a C long. You get the same kind of error in other
    places where PyLong_AsLong or PyLong_AsInt is called on a
    user-supplied value, for example:

        >>> import pickle
        >>> pickle.dumps(10**19, 10**19)
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        OverflowError: Python int too large to convert to C long

    @mdickinson
    Copy link
    Member

    Thinking about it, ValueError seems like the right exception type: nothing's actually overflowing, because we haven't even tried to do any computation. This is just a limitation of what math.factorial is prepared to accept as input (and perhaps that limitation should be documented).

    Here's a patch. Bikeshedding about the exception type and the exception message is welcome.

    @mdickinson mdickinson added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Feb 7, 2014
    @vstinner
    Copy link
    Member

    vstinner commented Feb 7, 2014

    OverflowError makes sense because math.factorial(10**19) will overflow in CPython on common platforms, even if it didn't overflowed yet.

    On a supercomputer with a different Python implementation, you may be able to compute it.

    IMO An OverflowError is specific to a platform and Python implementation, whereas ValueError is "portable": any Python implementation must raise such error.

    I can imagine that a Python implementation may return a pseudo-int type which is supposed to be the result of math.factorial(), so you can compute for example math.factorial(10**19) % 2 (hint: result is 0).

    @mdickinson
    Copy link
    Member

    Okay; OverflowError seems to have the majority vote. I'll change it to that.

    Any strong opinions on the message?

    @pitrou
    Copy link
    Member

    pitrou commented Feb 7, 2014

    "Outrageously" sounds like the argument is immoral, but do we have an objective test for that? :-)

    (nitpicking a bit: negative values should probably raise a proper ValueError, no?)

    @mdickinson
    Copy link
    Member

    nitpicking a bit: negative values should probably raise a proper ValueError, no?

    I think they do, with this patch. Or maybe I'm misunderstanding?

    With the current form of the patch:

    >>> math.factorial(10**20)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: factorial() argument outrageously large
    >>> math.factorial(-10**20)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: factorial() not defined for negative values

    @pitrou
    Copy link
    Member

    pitrou commented Feb 7, 2014

    Ah, well, my bad. I just wanted to have a good argument :-)

    @mdickinson
    Copy link
    Member

    Updated patch.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Feb 7, 2014

    I slightly favor the ValueError patch because there is only a single exception
    to catch. PyLong_AsUnsignedLong() also raises OverflowError for both positive
    values that are too large and for negative values.

    Perhaps the error message could contain the actual range of allowed values,
    especially since LONG_MAX differs across platforms.

    The drawback is that "outrageously" would have to go then. :)

    @garybernhardt
    Copy link
    Mannequin

    garybernhardt mannequin commented Feb 7, 2014

    Although I thoroughly enjoyed "outrageously", I agree with Stefan about including the range of values. As a user who doesn't know the implementation, "outrageously" will just leave me asking why, but indicating the range will tell me exactly why the exception was raised.

    @mdickinson
    Copy link
    Member

    Updated patch.

    @mdickinson mdickinson self-assigned this Apr 9, 2014
    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Apr 9, 2014

    Mark, you said:

    OverflowError seems to have the majority vote. I'll change it to that.

    But your most recent patch seems to have gone back to ValueError for both cases. Is there a reason for that? FWIW, I favor OverflowError as the "too large" type, though I prefer the less hyperbolic phrasing of the error message in the new patch.

    @mdickinson
    Copy link
    Member

    Josh: well spotted; thanks for picking up on this. I was wondering whether anyone would pick up on that, and I should have mentioned the change explicitly.

    I do think ValueError is the correct exception here. My interpretation of OverflowError is that it represents an overflow occurring during the computation, and that's not what's happening here: we're flat-out rejecting an input value because it's out of the range of values we accept. Stefan's message influenced me a bit in that direction, too.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Apr 9, 2014

    I just think it's a little odd to make math.factorial uniquely sensitive to the documented definition of OverflowError. Basically every one of the non-masking PyLong_As<CType> interfaces uses OverflowError for too large values.

    In fact, all the functions which are converting to unsigned C types and aren't doing so for "program logic reasons" (e.g. the shift functions don't accept a negative shift due to semantic obscurity, and long_to_bytes doesn't accept a negative length for the output) use OverflowError when a negative value is passed as well.

    The PyArg_ParseTuple* functions raise OverflowError as well, either implicitly (when they call a PyLong_As<CType> function), or explicitly for those functions that impose more restrictive ranges than the PyLong function they wrap.

    Long story short: The majority of C extension functions that convert to C level types are raising OverflowError to mean "out of range for implementation's chosen C type" (whether too large, too small, or negative when only unsigned values excepted) as a side-effect of using Python's documented APIs. Making math.factorial the exception would be violating the reasonable expectation one might get from using similar functions.

    @mdickinson
    Copy link
    Member

    Making math.factorial the exception would be violating the reasonable expectation one might get from using similar functions.

    Can you point to some examples of those similar functions? I can't think of any obvious examples offhand.

    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Apr 10, 2014

    A few examples (some are patently ridiculous, since the range of values anyone would use ends long before you'd overflow a 32 bit integer, let alone a 64 bit value on my build of Python, but bear with me:

    >>> datetime.datetime(2**64, 1, 2)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: Python int too large to convert to C long
    >>> datetime.datetime(-2**64, 1, 2)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: Python int too large to convert to C long
    
    >>> time.mktime(time.struct_time([2**64]*9))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: Python int too large to convert to C long
    
    >>> sqlite3.enable_callback_tracebacks(2**64)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: Python int too large to convert to C long

    (That last one should really be changed to type code 'p' over 'i', or to 'B' since it's just a boolean, so overflow doesn't matter, just truthy/falsy behavior)

    It also happens if you pass re functions/methods a too large flags value:
    >>> re.sub(r'(abc)', r'\1', 'abcd', re.IGNORECASE << 64)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/shadowranger/src/cpython/Lib/re.py", line 175, in sub
        return _compile(pattern, flags).sub(repl, string, count)
    OverflowError: Python int too large to convert to C ssize_t

    Skipping the tracebacks, a few more examples of functions where at least one argument can raise OverflowError for implementation specific reasons, rather than a logical overflow of some kind (which is what math.factorial's is):

    os.getpriority, os.setpriority, os.waitid, os.tcsetpgrp, a central utility function (get_data) in zipimport (I believe the value it's parsing is derived from a zip header, so it may not be possible to feed it too-large values; haven't checked), quite a number of socket functions (often for stuff that should really be ValueErrors, e.g. port numbers out of range), and more random things. I found all of these with a simple:

    find cpython/ -type f -name '*.c' -exec grep -nP 'PyArg_Parse.*?"\w*?[bhilL]"' {} + > exampleoverflow.txt

    There aren't any other good examples in math, largely because the other functions there deal with floats (or have arbitrary precision integer fallback paths, in the case of the log suite of functions).

    That find only scratches the surface; many PyArg_Parse* calls are split across lines (so my simple regex won't catch them), and old Python code has a habit of not using PyArg_Parse* even when it makes sense (presumably because they wanted to customize error messages, or didn't like the way the provided formatting codes handled edge cases).

    In reality, any place PyLong_As* is called (when * is not one of the masking functions) on an argument that came from the user without explicitly checking for an replacing OverflowError will potentially trigger this issue. A cursory search of locations where this function is called reveals OverflowErrors in the r parameter to to itertools.permutations, and that decimal is riddled with cases where they return if PyLong_As* has an error (including OverflowError) without changing the exception type, then a second round of range checking will set ValueError if it didn't Overflow. Examples include Context object's prec and clamp properties, but there are a dozen or more functions doing this, and I don't know if all of them are publically accessible.

    Fewer of the calls will be publically visible, so there's more to look through, but you can run the same search to find tons of places with potentially similar behavior:

    find cpython/ -type f -name '*.c' -exec grep -nP 'Py(Long|Number)As(?!.*(?:Mask|NULL|PyExc(?!Overflow)))' {} + > exampleoverflowdirectcall.txt

    I suspect that for every case where Python standard libs behave this way (raising OverflowErrors in ways that disregard the official docs description of when it should be used), there are a dozen where a third party module behaves this way, since third party modules are more likely to use the standardized argument parsing and numeric parsing APIs without rejiggering the default exceptions, assuming that the common APIs raise the "correct" errors.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 10, 2014

    OverflowError during initialization dates back to the early days of Python:

    http://hg.python.org/cpython/rev/0437738279a8

    Integer arithmetic actually did raise OverflowError back then:

    >>> 2222222222222 * 22222222222222
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    OverflowError: integer multiplication
    >>>

    This of course no longer happens, so the whole idea of OverflowError seems
    slightly outdated.

    @mdickinson
    Copy link
    Member

    Looks like this harmless-looking issue has opened up a can of worms.

    the whole idea of OverflowError seems slightly outdated.

    Well, not entirely. It's still got a place for overflow of mathematical operations, and I think it's clearly the correct exception in that case (indeed, it's about the only case I can think of where OverflowError is clearly the correct exception).

    >>> from math import exp
    >>> exp(5000.0)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: math range error

    I'd treat conversion to float as a special case of this; that is, the following is another case where OverflowError is (IMO) the right thing to use:

    >>> float(10**1000)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: long int too large to convert to float

    A less clear-cut case:

    >>> sqrt(10**500)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: long int too large to convert to float

    This one still seems okay to me, even though the sqrt result *is* within the floating-point range. You can read this as two operations: a conversion from the integer input to a float, followed by a floating-point square root operation, and it's the intermediate result that overflows.

    And a case that's clearly wrong:

    >>> 53 >> 10**100  # Why not simply return 0?
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: Python int too large to convert to C ssize_t

    Another historical oddity is that OverflowError inherits from ArithmeticError (along with FloatingPointError and ZeroDivisionError), which supports its use for arithmetic and mathematical operations that overflow their target type.

    >>> OverflowError.__mro__
    (<class 'OverflowError'>, <class 'ArithmeticError'>, <class 'Exception'>, <class 'BaseException'>, <class 'object'>)

    In the interests of moving this *particular* issue forward, I'll revert to OverflowError for this patch. I still personally think it's the wrong exception, but:

    (1) It's not more wrong than before: we were already raising OverflowError for large positive values.
    (2) It achieves the main aim of replacing an obscure error message with a more understandable one (which is what Nick wanted in the first place: msg210454).
    (3) Given the lack of clarity about which exception types are appropriate where, I think we shouldn't be changing exception types unless/until there's a clear vision of where we want to end up. Getting that clear vision may require a python-dev discussion.

    We can still make the change from OverflowError to ValueError at some later point once it's clear that that's the right thing to do. But converting from OverflowError to ValueError now, and then deciding later that that was wrong and converting back to OverflowError would be a horrible thing to do.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 10, 2014

    Mark Dickinson <report@bugs.python.org> wrote:

    > the whole idea of OverflowError seems slightly outdated.

    Well, not entirely. It's still got a place for overflow of mathematical operations, and I think it's clearly the correct exception in that case (indeed, it's about the only case I can think of where OverflowError is clearly the correct exception).

    Indeed, I was focusing on integer arithmetic and assignments. For float
    operations and conversions OverflowError is quite natural.

    (3) Given the lack of clarity about which exception types are appropriate where, I think we shouldn't be changing exception types unless/until there's a clear vision of where we want to end up. Getting that clear vision may require a python-dev discussion.

    Using OverflowError for now sounds like a good plan.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 10, 2014

    New changeset 273e17260d25 by Mark Dickinson in branch 'default':
    Issue bpo-20539: Improve math.factorial error messages and types for large inputs.
    http://hg.python.org/cpython/rev/273e17260d25

    @mdickinson
    Copy link
    Member

    Patch applied to the default branch (but with OverflowError instead of ValueError for large positive inputs). I don't see a lot of benefit in applying the fix to the maintenance branches. Closing this issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants