More accurate estimation of the number of digits in int to decimal string conversion #69588

serhiy-storchaka · 2015-10-14T11:42:44Z

BPO	25402
Nosy	@mdickinson, @vstinner, @serhiy-storchaka
Files	long_to_decimal_string_number_of_digits.patch estimate_decimalbase_digits.py

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/mdickinson'
closed_at = <Date 2016-08-29.16:30:48.811>
created_at = <Date 2015-10-14.11:42:43.861>
labels = ['interpreter-core', 'type-bug']
title = 'More accurate estimation of the number of digits in int to decimal string conversion'
updated_at = <Date 2016-08-29.16:30:48.810>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2016-08-29.16:30:48.810>
actor = 'mark.dickinson'
assignee = 'mark.dickinson'
closed = True
closed_date = <Date 2016-08-29.16:30:48.811>
closer = 'mark.dickinson'
components = ['Interpreter Core']
creation = <Date 2015-10-14.11:42:43.861>
creator = 'serhiy.storchaka'
dependencies = []
files = ['40782', '40786']
hgrepos = []
issue_num = 25402
keywords = ['patch']
message_count = 5.0
messages = ['252985', '253036', '253037', '273869', '273870']
nosy_count = 4.0
nosy_names = ['mark.dickinson', 'vstinner', 'python-dev', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue25402'
versions = ['Python 3.6']

serhiy-storchaka · 2015-10-14T11:42:43Z

Int to decimal string conversion (function long_to_decimal_string_internal() at Objects/longobject.c:1583) has a limitation. On 32-bit platform you can't convert integers larger than 2**2**31 (10**646456993). Proposed patch removes this limitation [*].

It also decreases memory requirements for intermediate buffer on 10%. The size of intermediate buffer (in digits) depends on the size of the integer. Unpatched:

For 15-bit digits: size*15/4/3 = size*1.25
For 30-bit digits: size*30/9/3 = size*1.11

Patched:
For 15-bit digits: size*15/4/3.3 = size*1.14
For 30-bit digits: size*30/9/3.3 = size*1.01

[*] Converting such large integers to decimal string can be not finished for reasonable time, because it has quadratic complexity. On my netbook the estimated time of calculating str(2**2**31) is 5 years. But this is different issue.

vstinner · 2015-10-15T08:34:14Z

Currently, the code uses Py_ABS(Py_SIZE(x))*PyLong_SHIFT to estimate the upper-bound of the number of bits of the number x. It's a raw estimation, the difference can be up to 29 bits. We may try to compute the exact number of bits, x.bit_length().

Python 3.5 estimate the number of "decimalbase" (10**9) digits using:

def decimalbase_digits1(x):
    bits = size(x) * PyLong_SHIFT
    return 1 + bits // (3 * _PyLong_DECIMAL_SHIFT)

I wrote a test to compute how many digits are overallocated (and unused): 552961 for this function. I'm not sure that "1+" is needed, since 3.0 is already lower than log2(10) (3.32...). If we compute the exact number of bits using the Python 3.5 function, it's a little bit better:

def decimalbase_digits2(x):
    bits = x.bit_length()
    return 1 + bits // (3 * _PyLong_DECIMAL_SHIFT)

=> 546250 digits (1% less). You propose a better estimation:

def decimalbase_digits3(x):
    digits = size(x)
    d = (33 * _PyLong_DECIMAL_SHIFT) // (10 * PyLong_SHIFT - 33 * _PyLong_DECIMAL_SHIFT)
    return 1 + digits + digits // d

With your estimation, only 504243 are overallocated (9% less than Python 3.5 function). But why only using 2 digits for log2(10) estimation? We can more digits:

def decimalbase_digits4(x):
    bits = size(x) * PyLong_SHIFT
    return bits * 10000 // (33219 * _PyLong_DECIMAL_SHIFT)

=> 491908 digits (11% less)

According to my tests, the best function uses the number of bits and the better estimation of log2(10):

def new_decimalbase_digits5(x):
    bits = x.bit_length()
    return bits * 10000 // (33219 * _PyLong_DECIMAL_SHIFT)

=> 483424 digits (13% less)

See attached for my tests.

serhiy-storchaka · 2015-10-15T09:40:01Z

But why only using 2 digits for log2(10) estimation?

Because the difference between 3 and 3.3 is 10%, and the difference between 3.3 and exact log2(10) is only 1%. Yes, we can use more digits, but the effect of any additional digit is decreased in geometric progression.

If use simple and fast formula that avoids integer multiplication overflow "digits + digits // d", the effect of additional precision is virtually vanished.

>>> PyLong_SHIFT, _PyLong_DECIMAL_SHIFT = 15, 4
>>> (3 * _PyLong_DECIMAL_SHIFT) // (1 * PyLong_SHIFT - 3 * _PyLong_DECIMAL_SHIFT)
4
>>> (33 * _PyLong_DECIMAL_SHIFT) // (10 * PyLong_SHIFT - 33 * _PyLong_DECIMAL_SHIFT)
7
>>> (33219 * _PyLong_DECIMAL_SHIFT) // (10000 * PyLong_SHIFT - 33219 * _PyLong_DECIMAL_SHIFT)
7

>>> PyLong_SHIFT, _PyLong_DECIMAL_SHIFT = 30, 9
>>> (3 * _PyLong_DECIMAL_SHIFT) // (1 * PyLong_SHIFT - 3 * _PyLong_DECIMAL_SHIFT)
9
>>> (33 * _PyLong_DECIMAL_SHIFT) // (10 * PyLong_SHIFT - 33 * _PyLong_DECIMAL_SHIFT)
99
>>> (332 * _PyLong_DECIMAL_SHIFT) // (100 * PyLong_SHIFT - 332 * _PyLong_DECIMAL_SHIFT)
249
>>> (3321 * _PyLong_DECIMAL_SHIFT) // (1000 * PyLong_SHIFT - 3321 * _PyLong_DECIMAL_SHIFT)
269
>>> (33219 * _PyLong_DECIMAL_SHIFT) // (10000 * PyLong_SHIFT - 33219 * _PyLong_DECIMAL_SHIFT)
290
>>> (332192 * _PyLong_DECIMAL_SHIFT) // (100000 * PyLong_SHIFT - 332192 * _PyLong_DECIMAL_SHIFT)
291
>>> (332192809 * _PyLong_DECIMAL_SHIFT) // (100000000 * PyLong_SHIFT - 332192809 * _PyLong_DECIMAL_SHIFT)
291

The best approximation with minimal denominator is 485/146.

>>> PyLong_SHIFT, _PyLong_DECIMAL_SHIFT = 15, 4
>>> (485 * _PyLong_DECIMAL_SHIFT) // (146 * PyLong_SHIFT - 485 * _PyLong_DECIMAL_SHIFT)
7
>>> PyLong_SHIFT, _PyLong_DECIMAL_SHIFT = 30, 9
>>> (485 * _PyLong_DECIMAL_SHIFT) // (146 * PyLong_SHIFT - 485 * _PyLong_DECIMAL_SHIFT)
291

python-dev · 2016-08-29T16:26:58Z

New changeset 1902e1d79e25 by Mark Dickinson in branch 'default':
Issue bpo-25402: in int-to-decimal-string conversion, reduce intermediate storage requirements and relax restriction on converting large integers. Patch by Serhiy Storchaka.
https://hg.python.org/cpython/rev/1902e1d79e25

mdickinson · 2016-08-29T16:30:49Z

Patch and analysis LGTM. Thanks!

serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Oct 14, 2015

bitdancer changed the title ~~Accurater estimation of the number of digits in int to decimal string conversion~~ More accurate estimation of the number of digits in int to decimal string conversion Oct 15, 2015

mdickinson closed this as completed Aug 29, 2016

mdickinson self-assigned this Aug 29, 2016

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More accurate estimation of the number of digits in int to decimal string conversion #69588

More accurate estimation of the number of digits in int to decimal string conversion #69588

serhiy-storchaka commented Oct 14, 2015

serhiy-storchaka commented Oct 14, 2015

vstinner commented Oct 15, 2015

serhiy-storchaka commented Oct 15, 2015

python-dev mannequin commented Aug 29, 2016

mdickinson commented Aug 29, 2016

More accurate estimation of the number of digits in int to decimal string conversion #69588

More accurate estimation of the number of digits in int to decimal string conversion #69588

Comments

serhiy-storchaka commented Oct 14, 2015

serhiy-storchaka commented Oct 14, 2015

vstinner commented Oct 15, 2015

serhiy-storchaka commented Oct 15, 2015

python-dev mannequin commented Aug 29, 2016

mdickinson commented Aug 29, 2016