classification
Title: Add Decimal.as_integer_ratio()
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: skrah Nosy List: belopolsky, gvanrossum, johnwalker, mark.dickinson, python-dev, rhettinger, serhiy.storchaka, skrah, steven.daprano, terry.reedy, tim.peters
Priority: normal Keywords: patch

Created on 2015-12-22 22:18 by johnwalker, last changed 2016-04-05 10:10 by skrah. This issue is now closed.

Files
File name Uploaded Description Edit
issue25928.diff skrah, 2015-12-28 14:12 review
Messages (27)
msg256873 - (view) Author: John Walker (johnwalker) * Date: 2015-12-22 22:18
In statistics, there is a FIXME on Line 250 above _decimal_to_ratio that says:

# FIXME This is faster than Fraction.from_decimal, but still too slow.

Half of the time is spent in a conversion in d.as_tuple(). Decimal internally stores the digits as a string, but in d.as_tuple(), the digits are individually cast to integers and returned as a tuple of integers.

This is OK, but _decimal_to_ratio undoes the work that was done in d.as_tuple() by adding them all back into an integer. A similar, but slightly different approach is taken in Fractions.from_decimal, where the tuple is cast into a string and then parsed into an integer. We can be a lot faster if we use the _int instance variable directly.

In the case of _decimal_to_ratio, the new code seems to be twice as fast with usage _decimal_to_ratio(Decimal(str(random.random()))):


def _decimal_to_ratio(d):
    sign, exp = d._sign, d._exp
    if exp in ('F', 'n', 'N'):  # INF, NAN, sNAN
        assert not d.is_finite()
        return (d, None)
    num = int(d._int)
    if exp < 0:
        den = 10**-exp
    else:
        num *= 10**exp
        den = 1
    if sign:
        num = -num
    return (num, den)

If the performance improvement is considered worthwhile, here are a few solutions I see.

1) Use _int directly in fractions and statistics.

2) Add a digits_as_str method to Decimal. This prevents exposing _int as an implementation detail, and makes sense to me since I suspect there is a lot of code casting the tuple of int to a string anyway. 

3) Add a parameter to as_tuple for determining whether digits should be returned as a string or a tuple.

4) Deprecate _int in Decimal and add a new reference str_digits.

There are probably more solutions. I lean towards 4, because it makes usage easier and avoids cluttering Decimal with methods. 

Here is what I used for benchmarks:

========

import timeit

old_setup = """
import random
from decimal import Decimal

def _decimal_to_ratio(d):
    sign, digits, exp = d.as_tuple()
    if exp in ('F', 'n', 'N'):  # INF, NAN, sNAN
        assert not d.is_finite()
        return (d, None)
    num = 0
    for digit in digits:
        num = num*10 + digit
    if exp < 0:
        den = 10**-exp
    else:
        num *= 10**exp
        den = 1
    if sign:
        num = -num
    return (num, den)

def run_it():
    dec = Decimal(str(random.random()))
    _decimal_to_ratio(dec)
"""

new_setup = """
import random
from decimal import Decimal

def _decimal_to_ratio(d):
    sign, exp = d._sign, d._exp
    if exp in ('F', 'n', 'N'):  # INF, NAN, sNAN
        assert not d.is_finite()
        return (d, None)
    num = int(d._int)
    if exp < 0:
        den = 10**-exp
    else:
        num *= 10**exp
        den = 1
    if sign:
        num = -num
    return (num, den)

def run_it():
    dec = Decimal(str(random.random()))
    _decimal_to_ratio(dec)
"""

if __name__ == '__main__':
    print("Testing proposed implementation")
    print("number = 10000")
    print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=10000))
    print("number = 100000")         
    print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=100000))
    print("number = 1000000")     
    print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=1000000))

    print("Testing old implementation")
    print("number = 10000")
    print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=10000))
    print("number = 100000")         
    print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=100000))
    print("number = 1000000")     
    print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=1000000))
msg256874 - (view) Author: John Walker (johnwalker) * Date: 2015-12-22 22:21
Heres the output of running the benchmark on my machine:

Testing proposed implementation
number = 10000
0.07098613299967838
number = 100000
0.6952260910002224
number = 1000000
6.948197601999709
Testing current implementation
number = 10000
0.1418162760000996
number = 100000
1.350394603001405
number = 1000000
13.625065807000283
msg256914 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-23 15:07
I guess there's some version mixup here:  From Python 3.3 on
the integrated C version of decimal does not store the digits
as a string and does not have the private _int method.
msg256917 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-23 15:18
May be implement the as_integer_ratio() method and/or numerator and denominator attributes in the Decimal class?
msg256932 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2015-12-23 20:57
On Wed, Dec 23, 2015 at 03:18:28PM +0000, Serhiy Storchaka wrote:
> May be implement the as_integer_ratio() method and/or numerator and 
> denominator attributes in the Decimal class?

That would also be good as it will decrease the API differences between 
floats and Decimals and make it easier to duck-type one for the other.
msg256933 - (view) Author: John Walker (johnwalker) * Date: 2015-12-23 21:01
> I guess there's some version mixup here:  From Python 3.3 on
> the integrated C version of decimal does not store the digits
> as a string and does not have the private _int method.

Stefan, _int is a slot in Lib/_pydecimal.py. It should be defined on python 3.5 and tip, unsure about other versions.

Python 3.5.1 (default, Dec  7 2015, 12:58:09) 
[GCC 5.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from decimal import Decimal
>>> Decimal("100.00")
Decimal('100.00')
>>> Decimal("100.00")._int
'10000'
msg256934 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-23 21:30
On Wed, Dec 23, 2015 at 09:01:22PM +0000, John Walker wrote:
> Stefan, _int is a slot in Lib/_pydecimal.py. It should be defined on python 3.5 and tip, unsure about other versions.
> 
> Python 3.5.1 (default, Dec  7 2015, 12:58:09) 
> [GCC 5.2.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from decimal import Decimal
> >>> Decimal("100.00")
> Decimal('100.00')
> >>> Decimal("100.00")._int
> '10000'

That should only happen if the C version did not build for some reason:

Python 3.6.0a0 (default:323c10701e5d, Dec 14 2015, 14:28:41) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from decimal import Decimal
>>> Decimal("100.00")._int
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'decimal.Decimal' object has no attribute '_int'
>>>
msg256936 - (view) Author: John Walker (johnwalker) * Date: 2015-12-23 21:46
> That should only happen if the C version did not build for some reason:

Ahh, gotcha. I assume one instance where this happens is when the machine doesn't have libmpdec.so
msg256937 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-23 21:50
No, the regular build uses the libmpdec that is shipped with
Python.  The external libmpdec.so only comes into play if you
compile --with-system-libmpdec.
msg256938 - (view) Author: John Walker (johnwalker) * Date: 2015-12-23 22:05
> No, the regular build uses the libmpdec that is shipped with
> Python.  The external libmpdec.so only comes into play if you
> compile --with-system-libmpdec.

Oh, OK. I see whats happening. My distro deletes the shipped version and compiles --with-system-libmpdec. We're on the same page now, thanks.

https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/python
msg256940 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-23 22:25
Let's re-target this issue:

Implementing as_integer_ratio() sounds reasonable, but let's hope
people won't try to convert Decimal('1E+999999999999999999').


Mark, was there perhaps a reason not to add the method?
msg257003 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-12-25 17:58
A visible new feature is an enhancement (even if performance is the reason for the new feature) and can only go in 3.6.
msg257073 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-27 12:13
Previously the method was rejected in #8947.  But the
speedup is quite dramatic. This is a benchmark for
converting "1.91918261298362195e-100", 1000000 repetitions:


float.as_integer_ratio: 0.548023
Decimal.as_integer_ratio: normalize=False: 2.661191
Decimal.as_integer_ratio: normalize=True: 4.382903
Fraction.from_decimal: 29.436584
msg257088 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2015-12-27 19:12
> Mark, was there perhaps a reason not to add the method?

In the past, there's been a desire to keep the decimal module minimal in scope, implementing little more than what's required for the specification. A number of proposed additions have been rejected on this basis.

Ah, and now I read Stefan's reference to #8947. I'm not sure anything has really changed since that discussion.

Count me as -0E0 on the proposal.

Off-topic: I wonder whether `int.as_integer_ratio` should be implemented (for the same reasons that `int.numerator`, `int.real` and so on are).
msg257090 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-27 19:53
Keeping the API small is a good reason against it.

There aren't many ways to destructure a Decimal though:  As far
as I know, the return value of as_tuple() dates back from the time
when the coefficient was actually a tuple.  The function is slow
and not really nice to use.

Probably many people would be happy if we added as_integer_ratio()
and an as_triple() function that represents the coefficient as
an integer.


> Off-topic: I wonder whether `int.as_integer_ratio` should be implemented (for the same reasons that `int.numerator`, `int.real` and so on are).

If users want to duck-type, I think yes.
msg257094 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-27 20:10
Arguments against as_integer_ratio() look weighty. But may be there are less arguments against numerator and denominator?
msg257097 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-12-27 20:30
Now that there is more than one use case for Decimal.as_integer_ratio(), I'll add my support to this feature request.
msg257115 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-28 14:12
Here's a patch. The Python and doc parts are from Mark (#8947).

I did not optimize the normalization yet, in C the code is
less clean and there were some corner cases where the gcd is
actually faster.
msg257137 - (view) Author: Roundup Robot (python-dev) Date: 2015-12-28 22:14
New changeset f3b09c269af0 by Stefan Krah in branch 'default':
Issue #25928: Add Decimal.as_integer_ratio(). Python parts and docs by
https://hg.python.org/cpython/rev/f3b09c269af0
msg257141 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-28 22:37
Hopefully I wasn't moving too fast, but I won't have time in the next
days/weeks.

Serhiy and Alexander (#8947) would prefer more homogeneous error
messages and docstrings between float/pydecimal/cdecimal.

I wouldn't mind a patch in another issue (no argument clinic!),
provided that we agree on something and have input from the
native English speakers.


Thanks for the review!
msg257147 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-28 22:52
> Hopefully I wasn't moving too fast, but I won't have time in the next
days/weeks.

No, the patch is pretty clear. Thanks.

> Serhiy and Alexander (#8947) would prefer more homogeneous error
messages and docstrings between float/pydecimal/cdecimal.

This would help to optimize creating a Fraction. Opened issue25971 for this.
msg257149 - (view) Author: Roundup Robot (python-dev) Date: 2015-12-28 23:12
New changeset 510ff609cb4f by Stefan Krah in branch 'default':
Issue #25928: Temporarily disable some tests in test_statistics in order
https://hg.python.org/cpython/rev/510ff609cb4f
msg257150 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-28 23:15
Steven, could you have a look at the failures in test_statistics?
Some tests fail because they assume non-normalized fractions, I
still have to look at the other assumptions.
msg257151 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-28 23:47
Ah yes, the test_statistics failures look like #18841 again.
msg257179 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-29 12:51
I've opened #25974 for the statistics.py issues.
msg262887 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-04-05 02:39
I don't think a new public method should have been added.

Historically, we've been careful to not grow the API beyond what is in the spec or the dunder methods required to interface with standard Python.

The feature creep is at odds with the intended goals for the module that have been present since the outset.  As long as the spec remains unchanged, the API for this module should be treated as stable.

Another issue is that the API for the module is already so large that it impairs usability.  Please don't make it worse by adding new methods and inventing details that aren't in the spec.
msg262896 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-04-05 10:10
Raymond, you added your support in msg257097.  I'm not very happy to spend my time implementing the feature and then rehashing everything after 3 months.
History
Date User Action Args
2016-04-05 10:10:41skrahsetnosy: + gvanrossum
messages: + msg262896
2016-04-05 02:39:13rhettingersetnosy: + tim.peters
messages: + msg262887
2015-12-29 12:51:14skrahsetstatus: open -> closed
resolution: fixed
messages: + msg257179

stage: patch review -> resolved
2015-12-28 23:47:09skrahsetmessages: + msg257151
2015-12-28 23:15:21skrahsetmessages: + msg257150
2015-12-28 23:12:04python-devsetmessages: + msg257149
2015-12-28 22:52:50serhiy.storchakasetmessages: + msg257147
2015-12-28 22:37:29skrahsetmessages: + msg257141
2015-12-28 22:14:15python-devsetnosy: + python-dev
messages: + msg257137
2015-12-28 16:47:16serhiy.storchakasetstage: test needed -> patch review
2015-12-28 14:12:26skrahsetfiles: + issue25928.diff
keywords: + patch
messages: + msg257115
2015-12-27 20:30:25rhettingersetmessages: + msg257097
2015-12-27 20:10:57serhiy.storchakasetmessages: + msg257094
2015-12-27 19:53:37skrahsetmessages: + msg257090
2015-12-27 19:12:46mark.dickinsonsetmessages: + msg257088
2015-12-27 12:13:24skrahsetnosy: + rhettinger, belopolsky
messages: + msg257073
2015-12-25 17:58:40terry.reedysetversions: + Python 3.6, - Python 3.5
nosy: + terry.reedy

messages: + msg257003

type: performance -> enhancement
stage: test needed
2015-12-23 22:25:16skrahsetnosy: + mark.dickinson
title: Improve performance of statistics._decimal_to_ratio and fractions.from_decimal -> Add Decimal.as_integer_ratio()
messages: + msg256940

assignee: skrah
2015-12-23 22:05:14johnwalkersetmessages: + msg256938
2015-12-23 21:50:32skrahsetmessages: + msg256937
2015-12-23 21:46:39johnwalkersetmessages: + msg256936
2015-12-23 21:30:05skrahsetmessages: + msg256934
2015-12-23 21:01:22johnwalkersetmessages: + msg256933
versions: + Python 3.5
2015-12-23 20:57:13steven.dapranosetmessages: + msg256932
2015-12-23 15:18:28serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg256917
2015-12-23 15:08:50skrahsetversions: - Python 3.5
2015-12-23 15:07:35skrahsetnosy: + steven.daprano
2015-12-23 15:07:15skrahsetnosy: + skrah
messages: + msg256914
2015-12-22 22:49:34johnwalkersettype: performance
2015-12-22 22:21:02johnwalkersetmessages: + msg256874
2015-12-22 22:18:25johnwalkercreate