New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More accurate estimation of the number of digits in int to decimal string conversion #69588
Comments
Int to decimal string conversion (function long_to_decimal_string_internal() at Objects/longobject.c:1583) has a limitation. On 32-bit platform you can't convert integers larger than 2**2**31 (10**646456993). Proposed patch removes this limitation [*]. It also decreases memory requirements for intermediate buffer on 10%. The size of intermediate buffer (in digits) depends on the size of the integer. Unpatched: For 15-bit digits: size*15/4/3 = size*1.25 Patched: [*] Converting such large integers to decimal string can be not finished for reasonable time, because it has quadratic complexity. On my netbook the estimated time of calculating str(2**2**31) is 5 years. But this is different issue. |
Currently, the code uses Py_ABS(Py_SIZE(x))*PyLong_SHIFT to estimate the upper-bound of the number of bits of the number x. It's a raw estimation, the difference can be up to 29 bits. We may try to compute the exact number of bits, x.bit_length(). Python 3.5 estimate the number of "decimalbase" (10**9) digits using: def decimalbase_digits1(x):
bits = size(x) * PyLong_SHIFT
return 1 + bits // (3 * _PyLong_DECIMAL_SHIFT) I wrote a test to compute how many digits are overallocated (and unused): 552961 for this function. I'm not sure that "1+" is needed, since 3.0 is already lower than log2(10) (3.32...). If we compute the exact number of bits using the Python 3.5 function, it's a little bit better: def decimalbase_digits2(x):
bits = x.bit_length()
return 1 + bits // (3 * _PyLong_DECIMAL_SHIFT) => 546250 digits (1% less). You propose a better estimation: def decimalbase_digits3(x):
digits = size(x)
d = (33 * _PyLong_DECIMAL_SHIFT) // (10 * PyLong_SHIFT - 33 * _PyLong_DECIMAL_SHIFT)
return 1 + digits + digits // d With your estimation, only 504243 are overallocated (9% less than Python 3.5 function). But why only using 2 digits for log2(10) estimation? We can more digits: def decimalbase_digits4(x):
bits = size(x) * PyLong_SHIFT
return bits * 10000 // (33219 * _PyLong_DECIMAL_SHIFT) => 491908 digits (11% less) According to my tests, the best function uses the number of bits and the better estimation of log2(10): def new_decimalbase_digits5(x):
bits = x.bit_length()
return bits * 10000 // (33219 * _PyLong_DECIMAL_SHIFT) => 483424 digits (13% less) See attached for my tests. |
Because the difference between 3 and 3.3 is 10%, and the difference between 3.3 and exact log2(10) is only 1%. Yes, we can use more digits, but the effect of any additional digit is decreased in geometric progression. If use simple and fast formula that avoids integer multiplication overflow "digits + digits // d", the effect of additional precision is virtually vanished. >>> PyLong_SHIFT, _PyLong_DECIMAL_SHIFT = 15, 4
>>> (3 * _PyLong_DECIMAL_SHIFT) // (1 * PyLong_SHIFT - 3 * _PyLong_DECIMAL_SHIFT)
4
>>> (33 * _PyLong_DECIMAL_SHIFT) // (10 * PyLong_SHIFT - 33 * _PyLong_DECIMAL_SHIFT)
7
>>> (33219 * _PyLong_DECIMAL_SHIFT) // (10000 * PyLong_SHIFT - 33219 * _PyLong_DECIMAL_SHIFT)
7
>>> PyLong_SHIFT, _PyLong_DECIMAL_SHIFT = 30, 9
>>> (3 * _PyLong_DECIMAL_SHIFT) // (1 * PyLong_SHIFT - 3 * _PyLong_DECIMAL_SHIFT)
9
>>> (33 * _PyLong_DECIMAL_SHIFT) // (10 * PyLong_SHIFT - 33 * _PyLong_DECIMAL_SHIFT)
99
>>> (332 * _PyLong_DECIMAL_SHIFT) // (100 * PyLong_SHIFT - 332 * _PyLong_DECIMAL_SHIFT)
249
>>> (3321 * _PyLong_DECIMAL_SHIFT) // (1000 * PyLong_SHIFT - 3321 * _PyLong_DECIMAL_SHIFT)
269
>>> (33219 * _PyLong_DECIMAL_SHIFT) // (10000 * PyLong_SHIFT - 33219 * _PyLong_DECIMAL_SHIFT)
290
>>> (332192 * _PyLong_DECIMAL_SHIFT) // (100000 * PyLong_SHIFT - 332192 * _PyLong_DECIMAL_SHIFT)
291
>>> (332192809 * _PyLong_DECIMAL_SHIFT) // (100000000 * PyLong_SHIFT - 332192809 * _PyLong_DECIMAL_SHIFT)
291 The best approximation with minimal denominator is 485/146. >>> PyLong_SHIFT, _PyLong_DECIMAL_SHIFT = 15, 4
>>> (485 * _PyLong_DECIMAL_SHIFT) // (146 * PyLong_SHIFT - 485 * _PyLong_DECIMAL_SHIFT)
7
>>> PyLong_SHIFT, _PyLong_DECIMAL_SHIFT = 30, 9
>>> (485 * _PyLong_DECIMAL_SHIFT) // (146 * PyLong_SHIFT - 485 * _PyLong_DECIMAL_SHIFT)
291 |
New changeset 1902e1d79e25 by Mark Dickinson in branch 'default': |
Patch and analysis LGTM. Thanks! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: