Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Decimal.as_integer_ratio() #70116

Closed
johnwalker mannequin opened this issue Dec 22, 2015 · 27 comments
Closed

Add Decimal.as_integer_ratio() #70116

johnwalker mannequin opened this issue Dec 22, 2015 · 27 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@johnwalker
Copy link
Mannequin

johnwalker mannequin commented Dec 22, 2015

BPO 25928
Nosy @gvanrossum, @tim-one, @rhettinger, @terryjreedy, @mdickinson, @abalkin, @stevendaprano, @skrah, @serhiy-storchaka
Files
  • issue25928.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/skrah'
    closed_at = <Date 2015-12-29.12:51:14.799>
    created_at = <Date 2015-12-22.22:18:25.222>
    labels = ['type-feature', 'library']
    title = 'Add Decimal.as_integer_ratio()'
    updated_at = <Date 2016-04-05.10:10:41.814>
    user = 'https://bugs.python.org/johnwalker'

    bugs.python.org fields:

    activity = <Date 2016-04-05.10:10:41.814>
    actor = 'skrah'
    assignee = 'skrah'
    closed = True
    closed_date = <Date 2015-12-29.12:51:14.799>
    closer = 'skrah'
    components = ['Library (Lib)']
    creation = <Date 2015-12-22.22:18:25.222>
    creator = 'johnwalker'
    dependencies = []
    files = ['41438']
    hgrepos = []
    issue_num = 25928
    keywords = ['patch']
    message_count = 27.0
    messages = ['256873', '256874', '256914', '256917', '256932', '256933', '256934', '256936', '256937', '256938', '256940', '257003', '257073', '257088', '257090', '257094', '257097', '257115', '257137', '257141', '257147', '257149', '257150', '257151', '257179', '262887', '262896']
    nosy_count = 11.0
    nosy_names = ['gvanrossum', 'tim.peters', 'rhettinger', 'terry.reedy', 'mark.dickinson', 'belopolsky', 'steven.daprano', 'skrah', 'python-dev', 'serhiy.storchaka', 'johnwalker']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue25928'
    versions = ['Python 3.6']

    @johnwalker
    Copy link
    Mannequin Author

    johnwalker mannequin commented Dec 22, 2015

    In statistics, there is a FIXME on Line 250 above _decimal_to_ratio that says:

    # FIXME This is faster than Fraction.from_decimal, but still too slow.

    Half of the time is spent in a conversion in d.as_tuple(). Decimal internally stores the digits as a string, but in d.as_tuple(), the digits are individually cast to integers and returned as a tuple of integers.

    This is OK, but _decimal_to_ratio undoes the work that was done in d.as_tuple() by adding them all back into an integer. A similar, but slightly different approach is taken in Fractions.from_decimal, where the tuple is cast into a string and then parsed into an integer. We can be a lot faster if we use the _int instance variable directly.

    In the case of _decimal_to_ratio, the new code seems to be twice as fast with usage _decimal_to_ratio(Decimal(str(random.random()))):

    def _decimal_to_ratio(d):
        sign, exp = d._sign, d._exp
        if exp in ('F', 'n', 'N'):  # INF, NAN, sNAN
            assert not d.is_finite()
            return (d, None)
        num = int(d._int)
        if exp < 0:
            den = 10**-exp
        else:
            num *= 10**exp
            den = 1
        if sign:
            num = -num
        return (num, den)

    If the performance improvement is considered worthwhile, here are a few solutions I see.

    1. Use _int directly in fractions and statistics.

    2. Add a digits_as_str method to Decimal. This prevents exposing _int as an implementation detail, and makes sense to me since I suspect there is a lot of code casting the tuple of int to a string anyway.

    3. Add a parameter to as_tuple for determining whether digits should be returned as a string or a tuple.

    4. Deprecate _int in Decimal and add a new reference str_digits.

    There are probably more solutions. I lean towards 4, because it makes usage easier and avoids cluttering Decimal with methods.

    Here is what I used for benchmarks:

    ========

    import timeit
    
    old_setup = """
    import random
    from decimal import Decimal
    
    def _decimal_to_ratio(d):
        sign, digits, exp = d.as_tuple()
        if exp in ('F', 'n', 'N'):  # INF, NAN, sNAN
            assert not d.is_finite()
            return (d, None)
        num = 0
        for digit in digits:
            num = num*10 + digit
        if exp < 0:
            den = 10**-exp
        else:
            num *= 10**exp
            den = 1
        if sign:
            num = -num
        return (num, den)
    
    def run_it():
        dec = Decimal(str(random.random()))
        _decimal_to_ratio(dec)
    """
    
    new_setup = """
    import random
    from decimal import Decimal
    
    def _decimal_to_ratio(d):
        sign, exp = d._sign, d._exp
        if exp in ('F', 'n', 'N'):  # INF, NAN, sNAN
            assert not d.is_finite()
            return (d, None)
        num = int(d._int)
        if exp < 0:
            den = 10**-exp
        else:
            num *= 10**exp
            den = 1
        if sign:
            num = -num
        return (num, den)
    
    def run_it():
        dec = Decimal(str(random.random()))
        _decimal_to_ratio(dec)
    """
    
    if __name__ == '__main__':
        print("Testing proposed implementation")
        print("number = 10000")
        print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=10000))
        print("number = 100000")         
        print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=100000))
        print("number = 1000000")     
        print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=1000000))
    
        print("Testing old implementation")
        print("number = 10000")
        print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=10000))
        print("number = 100000")         
        print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=100000))
        print("number = 1000000")     
        print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=1000000))

    @johnwalker johnwalker mannequin added the stdlib Python modules in the Lib dir label Dec 22, 2015
    @johnwalker
    Copy link
    Mannequin Author

    johnwalker mannequin commented Dec 22, 2015

    Heres the output of running the benchmark on my machine:

    Testing proposed implementation
    number = 10000
    0.07098613299967838
    number = 100000
    0.6952260910002224
    number = 1000000
    6.948197601999709
    Testing current implementation
    number = 10000
    0.1418162760000996
    number = 100000
    1.350394603001405
    number = 1000000
    13.625065807000283

    @johnwalker johnwalker mannequin added the performance Performance or resource usage label Dec 22, 2015
    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Dec 23, 2015

    I guess there's some version mixup here: From Python 3.3 on
    the integrated C version of decimal does not store the digits
    as a string and does not have the private _int method.

    @serhiy-storchaka
    Copy link
    Member

    May be implement the as_integer_ratio() method and/or numerator and denominator attributes in the Decimal class?

    @stevendaprano
    Copy link
    Member

    On Wed, Dec 23, 2015 at 03:18:28PM +0000, Serhiy Storchaka wrote:

    May be implement the as_integer_ratio() method and/or numerator and
    denominator attributes in the Decimal class?

    That would also be good as it will decrease the API differences between
    floats and Decimals and make it easier to duck-type one for the other.

    @johnwalker
    Copy link
    Mannequin Author

    johnwalker mannequin commented Dec 23, 2015

    I guess there's some version mixup here: From Python 3.3 on
    the integrated C version of decimal does not store the digits
    as a string and does not have the private _int method.

    Stefan, _int is a slot in Lib/_pydecimal.py. It should be defined on python 3.5 and tip, unsure about other versions.

    Python 3.5.1 (default, Dec  7 2015, 12:58:09) 
    [GCC 5.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from decimal import Decimal
    >>> Decimal("100.00")
    Decimal('100.00')
    >>> Decimal("100.00")._int
    '10000'

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Dec 23, 2015

    On Wed, Dec 23, 2015 at 09:01:22PM +0000, John Walker wrote:
    > Stefan, _int is a slot in Lib/_pydecimal.py. It should be defined on python 3.5 and tip, unsure about other versions.
    > 
    > Python 3.5.1 (default, Dec  7 2015, 12:58:09) 
    > [GCC 5.2.0] on linux
    > Type "help", "copyright", "credits" or "license" for more information.
    > >>> from decimal import Decimal
    > >>> Decimal("100.00")
    > Decimal('100.00')
    > >>> Decimal("100.00")._int
    > '10000'

    That should only happen if the C version did not build for some reason:

    Python 3.6.0a0 (default:323c10701e5d, Dec 14 2015, 14:28:41) 
    [GCC 4.8.4] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from decimal import Decimal
    >>> Decimal("100.00")._int
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'decimal.Decimal' object has no attribute '_int'
    >>>

    @johnwalker
    Copy link
    Mannequin Author

    johnwalker mannequin commented Dec 23, 2015

    That should only happen if the C version did not build for some reason:

    Ahh, gotcha. I assume one instance where this happens is when the machine doesn't have libmpdec.so

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Dec 23, 2015

    No, the regular build uses the libmpdec that is shipped with
    Python. The external libmpdec.so only comes into play if you
    compile --with-system-libmpdec.

    @johnwalker
    Copy link
    Mannequin Author

    johnwalker mannequin commented Dec 23, 2015

    No, the regular build uses the libmpdec that is shipped with
    Python. The external libmpdec.so only comes into play if you
    compile --with-system-libmpdec.

    Oh, OK. I see whats happening. My distro deletes the shipped version and compiles --with-system-libmpdec. We're on the same page now, thanks.

    https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/python

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Dec 23, 2015

    Let's re-target this issue:

    Implementing as_integer_ratio() sounds reasonable, but let's hope
    people won't try to convert Decimal('1E+999999999999999999').

    Mark, was there perhaps a reason not to add the method?

    @skrah skrah mannequin changed the title Improve performance of statistics._decimal_to_ratio and fractions.from_decimal Add Decimal.as_integer_ratio() Dec 23, 2015
    @skrah skrah mannequin self-assigned this Dec 23, 2015
    @terryjreedy
    Copy link
    Member

    A visible new feature is an enhancement (even if performance is the reason for the new feature) and can only go in 3.6.

    @terryjreedy terryjreedy added type-feature A feature request or enhancement and removed performance Performance or resource usage labels Dec 25, 2015
    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Dec 27, 2015

    Previously the method was rejected in bpo-8947. But the
    speedup is quite dramatic. This is a benchmark for
    converting "1.91918261298362195e-100", 1000000 repetitions:

    float.as_integer_ratio: 0.548023
    Decimal.as_integer_ratio: normalize=False: 2.661191
    Decimal.as_integer_ratio: normalize=True: 4.382903
    Fraction.from_decimal: 29.436584

    @mdickinson
    Copy link
    Member

    Mark, was there perhaps a reason not to add the method?

    In the past, there's been a desire to keep the decimal module minimal in scope, implementing little more than what's required for the specification. A number of proposed additions have been rejected on this basis.

    Ah, and now I read Stefan's reference to bpo-8947. I'm not sure anything has really changed since that discussion.

    Count me as -0E0 on the proposal.

    Off-topic: I wonder whether int.as_integer_ratio should be implemented (for the same reasons that int.numerator, int.real and so on are).

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Dec 27, 2015

    Keeping the API small is a good reason against it.

    There aren't many ways to destructure a Decimal though: As far
    as I know, the return value of as_tuple() dates back from the time
    when the coefficient was actually a tuple. The function is slow
    and not really nice to use.

    Probably many people would be happy if we added as_integer_ratio()
    and an as_triple() function that represents the coefficient as
    an integer.

    Off-topic: I wonder whether int.as_integer_ratio should be implemented (for the same reasons that int.numerator, int.real and so on are).

    If users want to duck-type, I think yes.

    @serhiy-storchaka
    Copy link
    Member

    Arguments against as_integer_ratio() look weighty. But may be there are less arguments against numerator and denominator?

    @rhettinger
    Copy link
    Contributor

    Now that there is more than one use case for Decimal.as_integer_ratio(), I'll add my support to this feature request.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Dec 28, 2015

    Here's a patch. The Python and doc parts are from Mark (bpo-8947).

    I did not optimize the normalization yet, in C the code is
    less clean and there were some corner cases where the gcd is
    actually faster.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 28, 2015

    New changeset f3b09c269af0 by Stefan Krah in branch 'default':
    Issue bpo-25928: Add Decimal.as_integer_ratio(). Python parts and docs by
    https://hg.python.org/cpython/rev/f3b09c269af0

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Dec 28, 2015

    Hopefully I wasn't moving too fast, but I won't have time in the next
    days/weeks.

    Serhiy and Alexander (bpo-8947) would prefer more homogeneous error
    messages and docstrings between float/pydecimal/cdecimal.

    I wouldn't mind a patch in another issue (no argument clinic!),
    provided that we agree on something and have input from the
    native English speakers.

    Thanks for the review!

    @serhiy-storchaka
    Copy link
    Member

    Hopefully I wasn't moving too fast, but I won't have time in the next
    days/weeks.

    No, the patch is pretty clear. Thanks.

    Serhiy and Alexander (bpo-8947) would prefer more homogeneous error
    messages and docstrings between float/pydecimal/cdecimal.

    This would help to optimize creating a Fraction. Opened bpo-25971 for this.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 28, 2015

    New changeset 510ff609cb4f by Stefan Krah in branch 'default':
    Issue bpo-25928: Temporarily disable some tests in test_statistics in order
    https://hg.python.org/cpython/rev/510ff609cb4f

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Dec 28, 2015

    Steven, could you have a look at the failures in test_statistics?
    Some tests fail because they assume non-normalized fractions, I
    still have to look at the other assumptions.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Dec 28, 2015

    Ah yes, the test_statistics failures look like bpo-18841 again.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Dec 29, 2015

    I've opened bpo-25974 for the statistics.py issues.

    @skrah skrah mannequin closed this as completed Dec 29, 2015
    @rhettinger
    Copy link
    Contributor

    I don't think a new public method should have been added.

    Historically, we've been careful to not grow the API beyond what is in the spec or the dunder methods required to interface with standard Python.

    The feature creep is at odds with the intended goals for the module that have been present since the outset. As long as the spec remains unchanged, the API for this module should be treated as stable.

    Another issue is that the API for the module is already so large that it impairs usability. Please don't make it worse by adding new methods and inventing details that aren't in the spec.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 5, 2016

    Raymond, you added your support in msg257097. I'm not very happy to spend my time implementing the feature and then rehashing everything after 3 months.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants