Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More accurate estimation of the number of digits in int to decimal string conversion #69588

Closed
serhiy-storchaka opened this issue Oct 14, 2015 · 5 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@serhiy-storchaka
Copy link
Member

BPO 25402
Nosy @mdickinson, @vstinner, @serhiy-storchaka
Files
  • long_to_decimal_string_number_of_digits.patch
  • estimate_decimalbase_digits.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/mdickinson'
    closed_at = <Date 2016-08-29.16:30:48.811>
    created_at = <Date 2015-10-14.11:42:43.861>
    labels = ['interpreter-core', 'type-bug']
    title = 'More accurate estimation of the number of digits in int to decimal string conversion'
    updated_at = <Date 2016-08-29.16:30:48.810>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2016-08-29.16:30:48.810>
    actor = 'mark.dickinson'
    assignee = 'mark.dickinson'
    closed = True
    closed_date = <Date 2016-08-29.16:30:48.811>
    closer = 'mark.dickinson'
    components = ['Interpreter Core']
    creation = <Date 2015-10-14.11:42:43.861>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['40782', '40786']
    hgrepos = []
    issue_num = 25402
    keywords = ['patch']
    message_count = 5.0
    messages = ['252985', '253036', '253037', '273869', '273870']
    nosy_count = 4.0
    nosy_names = ['mark.dickinson', 'vstinner', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue25402'
    versions = ['Python 3.6']

    @serhiy-storchaka
    Copy link
    Member Author

    Int to decimal string conversion (function long_to_decimal_string_internal() at Objects/longobject.c:1583) has a limitation. On 32-bit platform you can't convert integers larger than 2**2**31 (10**646456993). Proposed patch removes this limitation [*].

    It also decreases memory requirements for intermediate buffer on 10%. The size of intermediate buffer (in digits) depends on the size of the integer. Unpatched:

    For 15-bit digits: size*15/4/3 = size*1.25
    For 30-bit digits: size*30/9/3 = size*1.11

    Patched:
    For 15-bit digits: size*15/4/3.3 = size*1.14
    For 30-bit digits: size*30/9/3.3 = size*1.01

    [*] Converting such large integers to decimal string can be not finished for reasonable time, because it has quadratic complexity. On my netbook the estimated time of calculating str(2**2**31) is 5 years. But this is different issue.

    @serhiy-storchaka serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Oct 14, 2015
    @vstinner
    Copy link
    Member

    Currently, the code uses Py_ABS(Py_SIZE(x))*PyLong_SHIFT to estimate the upper-bound of the number of bits of the number x. It's a raw estimation, the difference can be up to 29 bits. We may try to compute the exact number of bits, x.bit_length().

    Python 3.5 estimate the number of "decimalbase" (10**9) digits using:

    def decimalbase_digits1(x):
        bits = size(x) * PyLong_SHIFT
        return 1 + bits // (3 * _PyLong_DECIMAL_SHIFT)

    I wrote a test to compute how many digits are overallocated (and unused): 552961 for this function. I'm not sure that "1+" is needed, since 3.0 is already lower than log2(10) (3.32...). If we compute the exact number of bits using the Python 3.5 function, it's a little bit better:

    def decimalbase_digits2(x):
        bits = x.bit_length()
        return 1 + bits // (3 * _PyLong_DECIMAL_SHIFT)

    => 546250 digits (1% less). You propose a better estimation:

    def decimalbase_digits3(x):
        digits = size(x)
        d = (33 * _PyLong_DECIMAL_SHIFT) // (10 * PyLong_SHIFT - 33 * _PyLong_DECIMAL_SHIFT)
        return 1 + digits + digits // d

    With your estimation, only 504243 are overallocated (9% less than Python 3.5 function). But why only using 2 digits for log2(10) estimation? We can more digits:

    def decimalbase_digits4(x):
        bits = size(x) * PyLong_SHIFT
        return bits * 10000 // (33219 * _PyLong_DECIMAL_SHIFT)

    => 491908 digits (11% less)

    According to my tests, the best function uses the number of bits and the better estimation of log2(10):

    def new_decimalbase_digits5(x):
        bits = x.bit_length()
        return bits * 10000 // (33219 * _PyLong_DECIMAL_SHIFT)

    => 483424 digits (13% less)

    See attached for my tests.

    @serhiy-storchaka
    Copy link
    Member Author

    But why only using 2 digits for log2(10) estimation?

    Because the difference between 3 and 3.3 is 10%, and the difference between 3.3 and exact log2(10) is only 1%. Yes, we can use more digits, but the effect of any additional digit is decreased in geometric progression.

    If use simple and fast formula that avoids integer multiplication overflow "digits + digits // d", the effect of additional precision is virtually vanished.

    >>> PyLong_SHIFT, _PyLong_DECIMAL_SHIFT = 15, 4
    >>> (3 * _PyLong_DECIMAL_SHIFT) // (1 * PyLong_SHIFT - 3 * _PyLong_DECIMAL_SHIFT)
    4
    >>> (33 * _PyLong_DECIMAL_SHIFT) // (10 * PyLong_SHIFT - 33 * _PyLong_DECIMAL_SHIFT)
    7
    >>> (33219 * _PyLong_DECIMAL_SHIFT) // (10000 * PyLong_SHIFT - 33219 * _PyLong_DECIMAL_SHIFT)
    7
    
    >>> PyLong_SHIFT, _PyLong_DECIMAL_SHIFT = 30, 9
    >>> (3 * _PyLong_DECIMAL_SHIFT) // (1 * PyLong_SHIFT - 3 * _PyLong_DECIMAL_SHIFT)
    9
    >>> (33 * _PyLong_DECIMAL_SHIFT) // (10 * PyLong_SHIFT - 33 * _PyLong_DECIMAL_SHIFT)
    99
    >>> (332 * _PyLong_DECIMAL_SHIFT) // (100 * PyLong_SHIFT - 332 * _PyLong_DECIMAL_SHIFT)
    249
    >>> (3321 * _PyLong_DECIMAL_SHIFT) // (1000 * PyLong_SHIFT - 3321 * _PyLong_DECIMAL_SHIFT)
    269
    >>> (33219 * _PyLong_DECIMAL_SHIFT) // (10000 * PyLong_SHIFT - 33219 * _PyLong_DECIMAL_SHIFT)
    290
    >>> (332192 * _PyLong_DECIMAL_SHIFT) // (100000 * PyLong_SHIFT - 332192 * _PyLong_DECIMAL_SHIFT)
    291
    >>> (332192809 * _PyLong_DECIMAL_SHIFT) // (100000000 * PyLong_SHIFT - 332192809 * _PyLong_DECIMAL_SHIFT)
    291

    The best approximation with minimal denominator is 485/146.

    >>> PyLong_SHIFT, _PyLong_DECIMAL_SHIFT = 15, 4
    >>> (485 * _PyLong_DECIMAL_SHIFT) // (146 * PyLong_SHIFT - 485 * _PyLong_DECIMAL_SHIFT)
    7
    >>> PyLong_SHIFT, _PyLong_DECIMAL_SHIFT = 30, 9
    >>> (485 * _PyLong_DECIMAL_SHIFT) // (146 * PyLong_SHIFT - 485 * _PyLong_DECIMAL_SHIFT)
    291

    @bitdancer bitdancer changed the title Accurater estimation of the number of digits in int to decimal string conversion More accurate estimation of the number of digits in int to decimal string conversion Oct 15, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 29, 2016

    New changeset 1902e1d79e25 by Mark Dickinson in branch 'default':
    Issue bpo-25402: in int-to-decimal-string conversion, reduce intermediate storage requirements and relax restriction on converting large integers. Patch by Serhiy Storchaka.
    https://hg.python.org/cpython/rev/1902e1d79e25

    @mdickinson
    Copy link
    Member

    Patch and analysis LGTM. Thanks!

    @mdickinson mdickinson self-assigned this Aug 29, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants