Issue 25402: More accurate estimation of the number of digits in int to decimal string conversion

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/69588

classification

Title:	More accurate estimation of the number of digits in int to decimal string conversion
Type:	behavior	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.6

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	mark.dickinson	Nosy List:	mark.dickinson, python-dev, serhiy.storchaka, vstinner
Priority:	normal	Keywords:	patch

Created on 2015-10-14 11:42 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
long_to_decimal_string_number_of_digits.patch	serhiy.storchaka, 2015-10-14 11:42		review
estimate_decimalbase_digits.py	vstinner, 2015-10-15 08:34

Messages (5)
msg252985 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-10-14 11:42
Int to decimal string conversion (function long_to_decimal_string_internal() at Objects/longobject.c:1583) has a limitation. On 32-bit platform you can't convert integers larger than 2231 (10*646456993). Proposed patch removes this limitation []. It also decreases memory requirements for intermediate buffer on 10%. The size of intermediate buffer (in digits) depends on the size of the integer. Unpatched: For 15-bit digits: size15/4/3 = size1.25 For 30-bit digits: size30/9/3 = size1.11 Patched: For 15-bit digits: size15/4/3.3 = size1.14 For 30-bit digits: size30/9/3.3 = size1.01 [] Converting such large integers to decimal string can be not finished for reasonable time, because it has quadratic complexity. On my netbook the estimated time of calculating str(22*31) is 5 years. But this is different issue.
msg253036 - (view)	Author: STINNER Victor (vstinner) *	Date: 2015-10-15 08:34
Currently, the code uses Py_ABS(Py_SIZE(x))PyLong_SHIFT to estimate the upper-bound of the number of bits of the number x. It's a raw estimation, the difference can be up to 29 bits. We may try to compute the exact number of bits, x.bit_length(). Python 3.5 estimate the number of "decimalbase" (109) digits using: def decimalbase_digits1(x): bits = size(x) PyLong_SHIFT return 1 + bits // (3 * _PyLong_DECIMAL_SHIFT) I wrote a test to compute how many digits are overallocated (and unused): 552961 for this function. I'm not sure that "1+" is needed, since 3.0 is already lower than log2(10) (3.32...). If we compute the exact number of bits using the Python 3.5 function, it's a little bit better: def decimalbase_digits2(x): bits = x.bit_length() return 1 + bits // (3 * _PyLong_DECIMAL_SHIFT) => 546250 digits (1% less). You propose a better estimation: def decimalbase_digits3(x): digits = size(x) d = (33 * _PyLong_DECIMAL_SHIFT) // (10 * PyLong_SHIFT - 33 * _PyLong_DECIMAL_SHIFT) return 1 + digits + digits // d With your estimation, only 504243 are overallocated (9% less than Python 3.5 function). But why only using 2 digits for log2(10) estimation? We can more digits: def decimalbase_digits4(x): bits = size(x) * PyLong_SHIFT return bits * 10000 // (33219 * _PyLong_DECIMAL_SHIFT) => 491908 digits (11% less) According to my tests, the best function uses the number of bits and the better estimation of log2(10): def new_decimalbase_digits5(x): bits = x.bit_length() return bits * 10000 // (33219 * _PyLong_DECIMAL_SHIFT) => 483424 digits (13% less) See attached for my tests.
msg253037 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-10-15 09:40
> But why only using 2 digits for log2(10) estimation? Because the difference between 3 and 3.3 is 10%, and the difference between 3.3 and exact log2(10) is only 1%. Yes, we can use more digits, but the effect of any additional digit is decreased in geometric progression. If use simple and fast formula that avoids integer multiplication overflow "digits + digits // d", the effect of additional precision is virtually vanished. >>> PyLong_SHIFT, _PyLong_DECIMAL_SHIFT = 15, 4 >>> (3 * _PyLong_DECIMAL_SHIFT) // (1 * PyLong_SHIFT - 3 * _PyLong_DECIMAL_SHIFT) 4 >>> (33 * _PyLong_DECIMAL_SHIFT) // (10 * PyLong_SHIFT - 33 * _PyLong_DECIMAL_SHIFT) 7 >>> (33219 * _PyLong_DECIMAL_SHIFT) // (10000 * PyLong_SHIFT - 33219 * _PyLong_DECIMAL_SHIFT) 7 >>> PyLong_SHIFT, _PyLong_DECIMAL_SHIFT = 30, 9 >>> (3 * _PyLong_DECIMAL_SHIFT) // (1 * PyLong_SHIFT - 3 * _PyLong_DECIMAL_SHIFT) 9 >>> (33 * _PyLong_DECIMAL_SHIFT) // (10 * PyLong_SHIFT - 33 * _PyLong_DECIMAL_SHIFT) 99 >>> (332 * _PyLong_DECIMAL_SHIFT) // (100 * PyLong_SHIFT - 332 * _PyLong_DECIMAL_SHIFT) 249 >>> (3321 * _PyLong_DECIMAL_SHIFT) // (1000 * PyLong_SHIFT - 3321 * _PyLong_DECIMAL_SHIFT) 269 >>> (33219 * _PyLong_DECIMAL_SHIFT) // (10000 * PyLong_SHIFT - 33219 * _PyLong_DECIMAL_SHIFT) 290 >>> (332192 * _PyLong_DECIMAL_SHIFT) // (100000 * PyLong_SHIFT - 332192 * _PyLong_DECIMAL_SHIFT) 291 >>> (332192809 * _PyLong_DECIMAL_SHIFT) // (100000000 * PyLong_SHIFT - 332192809 * _PyLong_DECIMAL_SHIFT) 291 The best approximation with minimal denominator is 485/146. >>> PyLong_SHIFT, _PyLong_DECIMAL_SHIFT = 15, 4 >>> (485 * _PyLong_DECIMAL_SHIFT) // (146 * PyLong_SHIFT - 485 * _PyLong_DECIMAL_SHIFT) 7 >>> PyLong_SHIFT, _PyLong_DECIMAL_SHIFT = 30, 9 >>> (485 * _PyLong_DECIMAL_SHIFT) // (146 * PyLong_SHIFT - 485 * _PyLong_DECIMAL_SHIFT) 291
msg273869 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-08-29 16:26
New changeset 1902e1d79e25 by Mark Dickinson in branch 'default': Issue #25402: in int-to-decimal-string conversion, reduce intermediate storage requirements and relax restriction on converting large integers. Patch by Serhiy Storchaka. https://hg.python.org/cpython/rev/1902e1d79e25
msg273870 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-08-29 16:30
Patch and analysis LGTM. Thanks!

History
Date	User	Action	Args
2022-04-11 14:58:22	admin	set	github: 69588
2016-08-29 16:30:48	mark.dickinson	set	status: open -> closed messages: + msg273870 assignee: mark.dickinson resolution: fixed stage: patch review -> resolved
2016-08-29 16:26:58	python-dev	set	nosy: + python-dev messages: + msg273869
2015-10-15 11:04:24	r.david.murray	set	title: Accurater estimation of the number of digits in int to decimal string conversion -> More accurate estimation of the number of digits in int to decimal string conversion
2015-10-15 09:40:01	serhiy.storchaka	set	messages: + msg253037
2015-10-15 08:34:15	vstinner	set	files: + estimate_decimalbase_digits.py messages: + msg253036
2015-10-15 07:29:37	vstinner	set	nosy: + vstinner
2015-10-14 11:42:43	serhiy.storchaka	create