Decimal.int overflows for large values #45291

aryx · 2007-08-08T20:43:22Z

BPO	1770416
Nosy	@birkenfeld, @facundobatista, @mdickinson, @devdanzin

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/facundobatista'
closed_at = <Date 2007-11-24.01:01:52.862>
created_at = <Date 2007-08-08.20:43:22.000>
labels = ['library']
title = 'Decimal.__int__ overflows for large values'
updated_at = <Date 2007-11-24.01:01:52.861>
user = 'https://bugs.python.org/aryx'

bugs.python.org fields:

activity = <Date 2007-11-24.01:01:52.861>
actor = 'facundobatista'
assignee = 'facundobatista'
closed = True
closed_date = <Date 2007-11-24.01:01:52.862>
closer = 'facundobatista'
components = ['Library (Lib)']
creation = <Date 2007-08-08.20:43:22.000>
creator = 'aryx'
dependencies = []
files = []
hgrepos = []
issue_num = 1770416
keywords = []
message_count = 10.0
messages = ['32602', '32603', '32604', '32605', '32606', '32607', '32608', '32609', '32610', '57797']
nosy_count = 5.0
nosy_names = ['georg.brandl', 'facundobatista', 'mark.dickinson', 'aryx', 'ajaksu2']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue1770416'
versions = []

aryx · 2007-08-08T20:43:22Z

This also affects Decimal.__hash__, since it [indirectly] calls Decimal.__int__.

>>> from decimal import Decimal as D
>>> e = D("1e1234567890987654321")
>>> int(e)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/decimal.py", line 1501, in __int__
    s = ''.join(map(str, self._int)) + '0'*self._exp
OverflowError: cannot fit 'long' into an index-sized integer
>>> e = D("1e1234567890")
>>> int(e)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/decimal.py", line 1501, in __int__
    s = ''.join(map(str, self._int)) + '0'*self._exp
MemoryError

Also, for values that do work this is incredibly slow if they are still fairly large.

devdanzin · 2007-08-09T08:09:29Z

Hi Jason,
The OverflowError is related to "index-sized ints" as in "ints that are valid indexes for sequences", try:
>>> e = "0" * 1234567890

So it seems that this error is avoiding the creation of a string of length 1234567890, which is a good thing (sorta) :)

Once I tried to implement a dec2long function that was based on numbers instead of strings, see if it helps (it's VERY slow and naive, but IIRC it was a bit faster than the original version and correct): http://groups.google.com/group/comp.lang.python/msg/aba7264ab38eb25e

Now, do you really need all that precision for such huge numbers? I know I didn't ;)
Daniel

aryx · 2007-08-09T15:09:54Z

Hey Daniel,

The bigger issue for us is mostly the fact that Decimal.__hash__ us calling Decimal.__int__ and not because we want an integer/long version of a very large Decimal. We do not actually cast the decimal into an int/long explicitly. I wouldn't have any issues if Decimal.__int__ remained as it is, but I think it would be a good idea for Decimal.__hash__ to do something differently. Probably something that is fast and simple, such as hash( self.as_tuple() ), self being a Decimal of course.

Our project is a CAS and we use Decimal for our real number class. I happened to run into this issue when I was implementing approximation of log(x) for extremely large/small values of x. I just started keyboard bashing numbers and behold, Decimal crashed on me :)

devdanzin · 2007-08-09T17:21:43Z

I see. Inheriting from Decimal and overloading __hash__ is a way to solve your problem, but it's IMHO a shallow bug and worth reporting.

I just tried hash(D.as_tuple()) and it is blazing fast. I think that unless the official line is "don't touch decimal.py until X", this change to hashing would be useful and (AFAICT) harmless enough to fit in e.g. 2.5.2. To avoid incompatibilities, __hash__ could check for Overflow and only use .as_tuple for values higher than the previous maximum (keeping, unfortunately, __hash__ slow for values below).

Could the current status of Decimal be made a bit more clear? Are bug reports/patches welcome? Is bugging Facundo or RHettinger welcome? :)

If getting __int__ a bit faster and able to convert sans huge strings is desired, I've updated that old function (see below) and AFAIK it could be used to replace Lib/decimal.py/Decimal.[int,long]. It gets about ten times faster on best cases and is about as slow on worst cases (could be much worse if "long(rint_part + rdec_part)/exponent" is a STUPID thing to do, but seems easy to avoid). As the original __int__ optimizes str(Decimal._int) and doesn't split/check for substrings, using the same path should speed this up more. I can run the tests and benchmark it (next month...) if there's interest.

def dec2long(number):
    """ Convert decimal.Decimal to long (abridged, non-checking version)"""
    decimal_string = str(number)
    if "e" in decimal_string:
        radix, exponent = decimal_string.split("e")
    elif "E" in decimal_string:
        radix, exponent = decimal_string.split("E")
    else:
        radix, exponent = (decimal_string, 0)
    if exponent:
        exponent = int(exponent)
        if "." in radix:
            rint, rdec = radix.split(".")
            radix_decimal_part_len = long(len(rdec))
            if radix_decimal_part_len <= exponent:
                radix_as_long = long(rint + rdec)
                corrected_exponent = exponent - radix_decimal_part_len
                result =  radix_as_long * 10L** corrected_exponent
            else:
                result = long(rint + rdec) / exponent
        else:
            radix_as_long = long(radix)
            result = radix_as_long * 10L**exponent
    else:
        if "." in radix:
            radix_integer_part = long(radix.split(".")[0])
        else:
            radix_integer_part = long(radix)
        result = radix_integer_part
    return result

As a comparison, here's __int__ (abridged) from decimal.py:
def __int__(number):
"""Converts self to an int, truncating if necessary."""
if number._exp >= 0:
s = ''.join(map(str, number._int)) + '0'*number._exp
else:
s = ''.join(map(str, number._int))[:number._exp]
if s == '':
s = '0'
sign = '-'*self._sign
return int(sign + s)

birkenfeld · 2007-08-09T17:37:40Z

Assigning to Facundo, he's actively working on decimal ATM.

mdickinson · 2007-08-12T17:57:46Z

Doesn't using hash(D.as_tuple()) break the principle that if two objects compare equal then they should have equal hash?

An alternative to converting to long before hashing is to use the fact that for the current hash implementation for long we have
hash(n) == hash(n % (2**32-1)) (except when n is a multiple of 2**32-1). For a Decimal d that's integral, one should be
able to compute d % (2**32-1) very quickly: if d = c*10**e then just use (c * pow(10, e, 2**32-1)) % (2**32-1), which should be acceptably fast even when d = 123456789E999999999999999.

The only tricky bit is that on a 64-bit system all those 2**32-1 need to be replaced by 2**64-1. Though now I come to think of it,
since 2**32-1 is a factor of 2**64-1 it would be enough just to do everything modulo 2**64-1 even on a 32-bit system.

mdickinson · 2007-08-12T19:43:37Z

Mark Dickinson (marketdickinson) stupidly claimed that:

hash(n) == hash(n % (2**32-1))

It's not true. Sorry for the noise.

mdickinson · 2007-08-13T04:50:22Z

See patch bpo-1772851

facundobatista · 2007-08-13T15:10:14Z

hash will NOT be changed as is requested in this bug:

>>> import decimal
>>> d = decimal.Decimal(1)
>>> hash(d)
1
>>> hash(1)
1
>>> hash(d.as_tuple())
-790594979
>>>

See this requirement from PEP-327:

hash(n) == hash(Decimal(n)) # Only if n is int, long, or Decimal

Regarding the overflow, this must to be fixed... I'll take a look to marketdickinson patch...

facundobatista · 2007-11-24T01:01:52Z

This was fixed at the same time than bpo-1772851.

int(D("1e1234567890987654321")) stills take too long, but this is fault
of doing 10**1234567890987654321 to convert it to an int.

Note that hash(D("1e1234567890987654321")) is fast now.

aryx mannequin assigned facundobatista Aug 8, 2007

aryx mannequin added the stdlib Python modules in the Lib dir label Aug 8, 2007

aryx mannequin assigned facundobatista Aug 8, 2007

aryx mannequin added the stdlib Python modules in the Lib dir label Aug 8, 2007

facundobatista closed this as completed Nov 24, 2007

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decimal.int overflows for large values #45291

Decimal.int overflows for large values #45291

aryx mannequin commented Aug 8, 2007

aryx mannequin commented Aug 8, 2007

devdanzin mannequin commented Aug 9, 2007

aryx mannequin commented Aug 9, 2007

devdanzin mannequin commented Aug 9, 2007

birkenfeld commented Aug 9, 2007

mdickinson commented Aug 12, 2007

mdickinson commented Aug 12, 2007

mdickinson commented Aug 13, 2007

facundobatista commented Aug 13, 2007

facundobatista commented Nov 24, 2007

Decimal.__int__ overflows for large values #45291

Decimal.__int__ overflows for large values #45291

Comments

aryx mannequin commented Aug 8, 2007

aryx mannequin commented Aug 8, 2007

devdanzin mannequin commented Aug 9, 2007

aryx mannequin commented Aug 9, 2007

devdanzin mannequin commented Aug 9, 2007

birkenfeld commented Aug 9, 2007

mdickinson commented Aug 12, 2007

mdickinson commented Aug 12, 2007

mdickinson commented Aug 13, 2007

facundobatista commented Aug 13, 2007

facundobatista commented Nov 24, 2007

Decimal.int overflows for large values #45291

Decimal.int overflows for large values #45291