msg256873 - (view) |
Author: John Walker (johnwalker) * |
Date: 2015-12-22 22:18 |
In statistics, there is a FIXME on Line 250 above _decimal_to_ratio that says:
# FIXME This is faster than Fraction.from_decimal, but still too slow.
Half of the time is spent in a conversion in d.as_tuple(). Decimal internally stores the digits as a string, but in d.as_tuple(), the digits are individually cast to integers and returned as a tuple of integers.
This is OK, but _decimal_to_ratio undoes the work that was done in d.as_tuple() by adding them all back into an integer. A similar, but slightly different approach is taken in Fractions.from_decimal, where the tuple is cast into a string and then parsed into an integer. We can be a lot faster if we use the _int instance variable directly.
In the case of _decimal_to_ratio, the new code seems to be twice as fast with usage _decimal_to_ratio(Decimal(str(random.random()))):
def _decimal_to_ratio(d):
sign, exp = d._sign, d._exp
if exp in ('F', 'n', 'N'): # INF, NAN, sNAN
assert not d.is_finite()
return (d, None)
num = int(d._int)
if exp < 0:
den = 10**-exp
else:
num *= 10**exp
den = 1
if sign:
num = -num
return (num, den)
If the performance improvement is considered worthwhile, here are a few solutions I see.
1) Use _int directly in fractions and statistics.
2) Add a digits_as_str method to Decimal. This prevents exposing _int as an implementation detail, and makes sense to me since I suspect there is a lot of code casting the tuple of int to a string anyway.
3) Add a parameter to as_tuple for determining whether digits should be returned as a string or a tuple.
4) Deprecate _int in Decimal and add a new reference str_digits.
There are probably more solutions. I lean towards 4, because it makes usage easier and avoids cluttering Decimal with methods.
Here is what I used for benchmarks:
========
import timeit
old_setup = """
import random
from decimal import Decimal
def _decimal_to_ratio(d):
sign, digits, exp = d.as_tuple()
if exp in ('F', 'n', 'N'): # INF, NAN, sNAN
assert not d.is_finite()
return (d, None)
num = 0
for digit in digits:
num = num*10 + digit
if exp < 0:
den = 10**-exp
else:
num *= 10**exp
den = 1
if sign:
num = -num
return (num, den)
def run_it():
dec = Decimal(str(random.random()))
_decimal_to_ratio(dec)
"""
new_setup = """
import random
from decimal import Decimal
def _decimal_to_ratio(d):
sign, exp = d._sign, d._exp
if exp in ('F', 'n', 'N'): # INF, NAN, sNAN
assert not d.is_finite()
return (d, None)
num = int(d._int)
if exp < 0:
den = 10**-exp
else:
num *= 10**exp
den = 1
if sign:
num = -num
return (num, den)
def run_it():
dec = Decimal(str(random.random()))
_decimal_to_ratio(dec)
"""
if __name__ == '__main__':
print("Testing proposed implementation")
print("number = 10000")
print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=10000))
print("number = 100000")
print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=100000))
print("number = 1000000")
print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=1000000))
print("Testing old implementation")
print("number = 10000")
print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=10000))
print("number = 100000")
print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=100000))
print("number = 1000000")
print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=1000000))
|
msg256874 - (view) |
Author: John Walker (johnwalker) * |
Date: 2015-12-22 22:21 |
Heres the output of running the benchmark on my machine:
Testing proposed implementation
number = 10000
0.07098613299967838
number = 100000
0.6952260910002224
number = 1000000
6.948197601999709
Testing current implementation
number = 10000
0.1418162760000996
number = 100000
1.350394603001405
number = 1000000
13.625065807000283
|
msg256914 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2015-12-23 15:07 |
I guess there's some version mixup here: From Python 3.3 on
the integrated C version of decimal does not store the digits
as a string and does not have the private _int method.
|
msg256917 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-12-23 15:18 |
May be implement the as_integer_ratio() method and/or numerator and denominator attributes in the Decimal class?
|
msg256932 - (view) |
Author: Steven D'Aprano (steven.daprano) * |
Date: 2015-12-23 20:57 |
On Wed, Dec 23, 2015 at 03:18:28PM +0000, Serhiy Storchaka wrote:
> May be implement the as_integer_ratio() method and/or numerator and
> denominator attributes in the Decimal class?
That would also be good as it will decrease the API differences between
floats and Decimals and make it easier to duck-type one for the other.
|
msg256933 - (view) |
Author: John Walker (johnwalker) * |
Date: 2015-12-23 21:01 |
> I guess there's some version mixup here: From Python 3.3 on
> the integrated C version of decimal does not store the digits
> as a string and does not have the private _int method.
Stefan, _int is a slot in Lib/_pydecimal.py. It should be defined on python 3.5 and tip, unsure about other versions.
Python 3.5.1 (default, Dec 7 2015, 12:58:09)
[GCC 5.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from decimal import Decimal
>>> Decimal("100.00")
Decimal('100.00')
>>> Decimal("100.00")._int
'10000'
|
msg256934 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2015-12-23 21:30 |
On Wed, Dec 23, 2015 at 09:01:22PM +0000, John Walker wrote:
> Stefan, _int is a slot in Lib/_pydecimal.py. It should be defined on python 3.5 and tip, unsure about other versions.
>
> Python 3.5.1 (default, Dec 7 2015, 12:58:09)
> [GCC 5.2.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from decimal import Decimal
> >>> Decimal("100.00")
> Decimal('100.00')
> >>> Decimal("100.00")._int
> '10000'
That should only happen if the C version did not build for some reason:
Python 3.6.0a0 (default:323c10701e5d, Dec 14 2015, 14:28:41)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from decimal import Decimal
>>> Decimal("100.00")._int
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'decimal.Decimal' object has no attribute '_int'
>>>
|
msg256936 - (view) |
Author: John Walker (johnwalker) * |
Date: 2015-12-23 21:46 |
> That should only happen if the C version did not build for some reason:
Ahh, gotcha. I assume one instance where this happens is when the machine doesn't have libmpdec.so
|
msg256937 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2015-12-23 21:50 |
No, the regular build uses the libmpdec that is shipped with
Python. The external libmpdec.so only comes into play if you
compile --with-system-libmpdec.
|
msg256938 - (view) |
Author: John Walker (johnwalker) * |
Date: 2015-12-23 22:05 |
> No, the regular build uses the libmpdec that is shipped with
> Python. The external libmpdec.so only comes into play if you
> compile --with-system-libmpdec.
Oh, OK. I see whats happening. My distro deletes the shipped version and compiles --with-system-libmpdec. We're on the same page now, thanks.
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/python
|
msg256940 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2015-12-23 22:25 |
Let's re-target this issue:
Implementing as_integer_ratio() sounds reasonable, but let's hope
people won't try to convert Decimal('1E+999999999999999999').
Mark, was there perhaps a reason not to add the method?
|
msg257003 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2015-12-25 17:58 |
A visible new feature is an enhancement (even if performance is the reason for the new feature) and can only go in 3.6.
|
msg257073 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2015-12-27 12:13 |
Previously the method was rejected in #8947. But the
speedup is quite dramatic. This is a benchmark for
converting "1.91918261298362195e-100", 1000000 repetitions:
float.as_integer_ratio: 0.548023
Decimal.as_integer_ratio: normalize=False: 2.661191
Decimal.as_integer_ratio: normalize=True: 4.382903
Fraction.from_decimal: 29.436584
|
msg257088 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2015-12-27 19:12 |
> Mark, was there perhaps a reason not to add the method?
In the past, there's been a desire to keep the decimal module minimal in scope, implementing little more than what's required for the specification. A number of proposed additions have been rejected on this basis.
Ah, and now I read Stefan's reference to #8947. I'm not sure anything has really changed since that discussion.
Count me as -0E0 on the proposal.
Off-topic: I wonder whether `int.as_integer_ratio` should be implemented (for the same reasons that `int.numerator`, `int.real` and so on are).
|
msg257090 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2015-12-27 19:53 |
Keeping the API small is a good reason against it.
There aren't many ways to destructure a Decimal though: As far
as I know, the return value of as_tuple() dates back from the time
when the coefficient was actually a tuple. The function is slow
and not really nice to use.
Probably many people would be happy if we added as_integer_ratio()
and an as_triple() function that represents the coefficient as
an integer.
> Off-topic: I wonder whether `int.as_integer_ratio` should be implemented (for the same reasons that `int.numerator`, `int.real` and so on are).
If users want to duck-type, I think yes.
|
msg257094 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-12-27 20:10 |
Arguments against as_integer_ratio() look weighty. But may be there are less arguments against numerator and denominator?
|
msg257097 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2015-12-27 20:30 |
Now that there is more than one use case for Decimal.as_integer_ratio(), I'll add my support to this feature request.
|
msg257115 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2015-12-28 14:12 |
Here's a patch. The Python and doc parts are from Mark (#8947).
I did not optimize the normalization yet, in C the code is
less clean and there were some corner cases where the gcd is
actually faster.
|
msg257137 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2015-12-28 22:14 |
New changeset f3b09c269af0 by Stefan Krah in branch 'default':
Issue #25928: Add Decimal.as_integer_ratio(). Python parts and docs by
https://hg.python.org/cpython/rev/f3b09c269af0
|
msg257141 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2015-12-28 22:37 |
Hopefully I wasn't moving too fast, but I won't have time in the next
days/weeks.
Serhiy and Alexander (#8947) would prefer more homogeneous error
messages and docstrings between float/pydecimal/cdecimal.
I wouldn't mind a patch in another issue (no argument clinic!),
provided that we agree on something and have input from the
native English speakers.
Thanks for the review!
|
msg257147 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-12-28 22:52 |
> Hopefully I wasn't moving too fast, but I won't have time in the next
days/weeks.
No, the patch is pretty clear. Thanks.
> Serhiy and Alexander (#8947) would prefer more homogeneous error
messages and docstrings between float/pydecimal/cdecimal.
This would help to optimize creating a Fraction. Opened issue25971 for this.
|
msg257149 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2015-12-28 23:12 |
New changeset 510ff609cb4f by Stefan Krah in branch 'default':
Issue #25928: Temporarily disable some tests in test_statistics in order
https://hg.python.org/cpython/rev/510ff609cb4f
|
msg257150 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2015-12-28 23:15 |
Steven, could you have a look at the failures in test_statistics?
Some tests fail because they assume non-normalized fractions, I
still have to look at the other assumptions.
|
msg257151 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2015-12-28 23:47 |
Ah yes, the test_statistics failures look like #18841 again.
|
msg257179 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2015-12-29 12:51 |
I've opened #25974 for the statistics.py issues.
|
msg262887 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2016-04-05 02:39 |
I don't think a new public method should have been added.
Historically, we've been careful to not grow the API beyond what is in the spec or the dunder methods required to interface with standard Python.
The feature creep is at odds with the intended goals for the module that have been present since the outset. As long as the spec remains unchanged, the API for this module should be treated as stable.
Another issue is that the API for the module is already so large that it impairs usability. Please don't make it worse by adding new methods and inventing details that aren't in the spec.
|
msg262896 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2016-04-05 10:10 |
Raymond, you added your support in msg257097. I'm not very happy to spend my time implementing the feature and then rehashing everything after 3 months.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:25 | admin | set | github: 70116 |
2018-03-13 19:04:17 | belopolsky | link | issue8947 superseder |
2016-04-05 10:10:41 | skrah | set | nosy:
+ gvanrossum messages:
+ msg262896
|
2016-04-05 02:39:13 | rhettinger | set | nosy:
+ tim.peters messages:
+ msg262887
|
2015-12-29 12:51:14 | skrah | set | status: open -> closed resolution: fixed messages:
+ msg257179
stage: patch review -> resolved |
2015-12-28 23:47:09 | skrah | set | messages:
+ msg257151 |
2015-12-28 23:15:21 | skrah | set | messages:
+ msg257150 |
2015-12-28 23:12:04 | python-dev | set | messages:
+ msg257149 |
2015-12-28 22:52:50 | serhiy.storchaka | set | messages:
+ msg257147 |
2015-12-28 22:37:29 | skrah | set | messages:
+ msg257141 |
2015-12-28 22:14:15 | python-dev | set | nosy:
+ python-dev messages:
+ msg257137
|
2015-12-28 16:47:16 | serhiy.storchaka | set | stage: test needed -> patch review |
2015-12-28 14:12:26 | skrah | set | files:
+ issue25928.diff keywords:
+ patch messages:
+ msg257115
|
2015-12-27 20:30:25 | rhettinger | set | messages:
+ msg257097 |
2015-12-27 20:10:57 | serhiy.storchaka | set | messages:
+ msg257094 |
2015-12-27 19:53:37 | skrah | set | messages:
+ msg257090 |
2015-12-27 19:12:46 | mark.dickinson | set | messages:
+ msg257088 |
2015-12-27 12:13:24 | skrah | set | nosy:
+ rhettinger, belopolsky messages:
+ msg257073
|
2015-12-25 17:58:40 | terry.reedy | set | versions:
+ Python 3.6, - Python 3.5 nosy:
+ terry.reedy
messages:
+ msg257003
type: performance -> enhancement stage: test needed |
2015-12-23 22:25:16 | skrah | set | nosy:
+ mark.dickinson title: Improve performance of statistics._decimal_to_ratio and fractions.from_decimal -> Add Decimal.as_integer_ratio() messages:
+ msg256940
assignee: skrah |
2015-12-23 22:05:14 | johnwalker | set | messages:
+ msg256938 |
2015-12-23 21:50:32 | skrah | set | messages:
+ msg256937 |
2015-12-23 21:46:39 | johnwalker | set | messages:
+ msg256936 |
2015-12-23 21:30:05 | skrah | set | messages:
+ msg256934 |
2015-12-23 21:01:22 | johnwalker | set | messages:
+ msg256933 versions:
+ Python 3.5 |
2015-12-23 20:57:13 | steven.daprano | set | messages:
+ msg256932 |
2015-12-23 15:18:28 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg256917
|
2015-12-23 15:08:50 | skrah | set | versions:
- Python 3.5 |
2015-12-23 15:07:35 | skrah | set | nosy:
+ steven.daprano
|
2015-12-23 15:07:15 | skrah | set | nosy:
+ skrah messages:
+ msg256914
|
2015-12-22 22:49:34 | johnwalker | set | type: performance |
2015-12-22 22:21:02 | johnwalker | set | messages:
+ msg256874 |
2015-12-22 22:18:25 | johnwalker | create | |