msg227284 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-09-22 17:37 |
Fractions are an excellent way to do exact money calculations and largely beat Decimal in terms of simplicity, accuracy and safety. Clearly not in terms of speed, though.
The current implementation does some heavy type checking and dispatching in __new__() and a simplistic gcd based normalisation. Here is a profiling run from the benchmark proposed in issue 22458 (which matches more or less the results with my own code):
6969671 function calls (6969278 primitive calls) in 4.835 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
519644 1.324 0.000 2.928 0.000 fractions.py:73(__new__)
519632 0.654 0.000 0.654 0.000 fractions.py:17(gcd)
319744 0.637 0.000 2.694 0.000 fractions.py:400(_add)
1039260 0.507 0.000 0.950 0.000 abc.py:178(__instancecheck__)
4 0.459 0.115 4.821 1.205 bm_telco_fractions.py:38(run)
1039300 0.443 0.000 0.443 0.000 _weakrefset.py:70(__contains__)
519616 0.301 0.000 4.329 0.000 fractions.py:373(forward)
199872 0.272 0.000 1.335 0.000 fractions.py:416(_mul)
1598720 0.117 0.000 0.117 0.000 fractions.py:278(denominator)
959232 0.074 0.000 0.074 0.000 fractions.py:274(numerator)
The instantiation itself takes twice as much time as the gcd calculations, and both dominate the overall runtime of the benchmark by about 60%. Improving the instantiation time would thus bring a substantial benefit.
|
msg227290 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-09-22 18:25 |
Adding Mark Dickinson to the noisy list who mentioned having worked on a C implementation for gcd(). I think this would be a good thing to try. However, the most important part would be to restructure and specialise Fraction.__new__().
|
msg227292 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-09-22 18:42 |
Here is a straight forward patch that special cases int values in the constructor. It gives me a 35% speedup in the benchmark.
|
msg227294 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-09-22 18:58 |
Updated patch - there is a somewhat hidden attempt in the code to keep nominator and denominater plain int values, not subtypes. That means that it's safer to restrict the optimisation to plain ints as well, which should still hit >95% of the use cases.
|
msg227295 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-09-22 19:19 |
What speedup give you second change?
|
msg227298 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-09-22 19:23 |
The second isn't a big difference because it only hits plain instantiations from integers. They are less likely to be performance critical than those from a quotient, which happen for all calculations.
It's more for symmetry, I guess.
|
msg227299 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-09-22 19:35 |
I found one more place where special casing helps: equal comparisons to integers, e.g. f == 0 or f == 1 or so. timeit shows me a speedup by a factor of three for this, with only a tiny slow-down when comparing fractions on both sides.
I put all of them in one patch, deselecting after applying should be easy enough.
|
msg227300 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-09-22 19:37 |
Microbenchmarks show about 2x speedup in both cases.
The patch LGTM.
|
msg227302 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-09-22 19:46 |
Benchmark profile after the patch:
5930670 function calls (5930288 primitive calls) in 3.748 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
519632 0.828 0.000 0.828 0.000 fractions.py:17(gcd)
519644 0.806 0.000 1.700 0.000 fractions.py:73(__new__)
319744 0.630 0.000 1.983 0.000 fractions.py:408(_add)
4 0.393 0.098 3.735 0.934 bm_telco_fractions.py:38(run)
519616 0.277 0.000 3.316 0.000 fractions.py:381(forward)
199872 0.275 0.000 0.901 0.000 fractions.py:424(_mul)
1598720 0.161 0.000 0.161 0.000 fractions.py:285(denominator)
520257 0.155 0.000 0.155 0.000 {built-in method isinstance}
959232 0.118 0.000 0.118 0.000 fractions.py:281(numerator)
519657 0.066 0.000 0.066 0.000 {built-in method __new__ of type object at 0x9d1c40}
Note that the gcd() call inside of __new__() now takes about as much time as the rest of the __new__() itself. It's clearly still worth optimising that.
|
msg227308 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-09-22 21:11 |
Here is another little optimisation that removes the redundant property lookups for the denominator in __add__() and __sub__().
New profile:
5291182 function calls (5290800 primitive calls) in 3.596 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
519632 0.843 0.000 0.843 0.000 fractions.py:17(gcd)
519644 0.800 0.000 1.709 0.000 fractions.py:73(__new__)
319744 0.520 0.000 1.816 0.000 fractions.py:408(_add)
4 0.401 0.100 3.582 0.896 bm_telco_fractions.py:38(run)
519616 0.291 0.000 3.156 0.000 fractions.py:381(forward)
199872 0.274 0.000 0.904 0.000 fractions.py:424(_mul)
520257 0.145 0.000 0.145 0.000 {built-in method isinstance}
959232 0.108 0.000 0.108 0.000 fractions.py:285(denominator)
959232 0.108 0.000 0.108 0.000 fractions.py:281(numerator)
519657 0.066 0.000 0.066 0.000 {built-in method __new__ of type object at 0x9d1c40}
|
msg227395 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-09-23 22:09 |
This simple Cython variant of gcd() is substantially faster for me (you may consider it pseudo-code for a C implementation):
def _gcd(a, b):
# Try doing all computation in C space. If the numbers are too large
# at the beginning, retry until they are small enough.
cdef long long ai, bi
while b:
try:
ai, bi = a, b
except OverflowError:
pass
else:
# switch to C loop
while bi:
ai, bi = bi, ai%bi
return ai
a, b = b, a%b
return a
It tries to drop the two numbers into C long-long-ints as soon as possible, and only runs the calculation in Python space until they are small enough to fit. In most cases, this will be either right from the start or fairly quickly after only a few iterations.
|
msg227406 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-09-24 06:29 |
BTW, the last two patches (int4 and redundant_property) are ready to be applied. Anyone?
|
msg227407 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2014-09-24 06:40 |
New changeset 646bc7d3544b by Georg Brandl in branch 'default':
#22464: Speed up common Fraction operations by special-casing several
https://hg.python.org/cpython/rev/646bc7d3544b
|
msg227408 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2014-09-24 06:41 |
Left this open in case you have additional patches coming.
|
msg227488 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-09-24 20:51 |
I created issue 22486 about the gcd() performance. I think we can close this ticket - I don't see any more obvious low hanging fruit and future findings can have their own ticket.
Out of interest, I modified the fractions module to compile Fraction into an extension type with Cython. For the benchmark, it currently gives me a 10x speedup over the original fractions module in Py3.4. I uploaded my initial attempt to PyPI and it's on github:
https://pypi.python.org/pypi/quicktions
https://github.com/scoder/quicktions
That makes a C implementation of Fraction generally worth considering for CPython.
|
msg227576 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-09-25 21:06 |
Sorry for reopening this, but I found one more thing. Division is pretty heavy on PyLong objects and there doesn't seem to be an internal optimisation for division by 1 and -1. Since many Fraction input values can already be normalised for some reason, the following change shaves off almost 30% of the calls to PyNumber_InPlaceFloorDivide() in the telco benchmark during Fraction instantiation according to callgrind, thus saving 20% of the CPU instructions that go into tp_new().
Instead of always executing
numerator //= g
denominator //= g
this avoids unnecessary divisions:
if g is not 1:
if g is -1:
numerator = -numerator
denominator = -denominator
else:
numerator //= g
denominator //= g
Using the "is" operator here is CPythonish, but doing the same with != and == should also be an improvement already, although not as cheap. Now, given that CPython caches small integers internally, I would suggest actually not adding these two special cases to the fractions module but more generally to the division routines in longobject.c.
|
msg227591 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2014-09-26 07:53 |
I tried it, but it seems better to open a new ticket for this as there are behavioural changes. See #22501.
|
msg227593 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-09-26 08:30 |
Please do not use "is" for number comparison. This can be broken unexpectedly in future or on alternative implementation.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:08 | admin | set | github: 66654 |
2014-09-26 08:30:22 | serhiy.storchaka | set | messages:
+ msg227593 |
2014-09-26 07:53:10 | scoder | set | status: open -> closed
messages:
+ msg227591 |
2014-09-25 21:06:12 | scoder | set | status: closed -> open
messages:
+ msg227576 |
2014-09-24 20:51:54 | scoder | set | status: open -> closed resolution: fixed messages:
+ msg227488
|
2014-09-24 06:41:11 | georg.brandl | set | nosy:
+ georg.brandl messages:
+ msg227408
|
2014-09-24 06:40:46 | python-dev | set | nosy:
+ python-dev messages:
+ msg227407
|
2014-09-24 06:29:23 | scoder | set | messages:
+ msg227406 |
2014-09-23 22:09:12 | scoder | set | messages:
+ msg227395 |
2014-09-23 06:46:35 | rhettinger | set | nosy:
+ rhettinger
|
2014-09-22 21:11:45 | scoder | set | files:
+ avoid_redundant_property_lookups.patch
messages:
+ msg227308 |
2014-09-22 19:46:55 | scoder | set | messages:
+ msg227302 |
2014-09-22 19:37:01 | serhiy.storchaka | set | messages:
+ msg227300 |
2014-09-22 19:35:53 | scoder | set | files:
+ special_case_int4.patch
messages:
+ msg227299 |
2014-09-22 19:23:27 | scoder | set | messages:
+ msg227298 |
2014-09-22 19:19:19 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg227295
|
2014-09-22 19:10:49 | scoder | set | files:
+ special_case_int3.patch |
2014-09-22 18:58:34 | scoder | set | files:
+ special_case_int2.patch
messages:
+ msg227294 |
2014-09-22 18:45:44 | scoder | set | type: performance |
2014-09-22 18:42:48 | scoder | set | files:
+ special_case_int.patch keywords:
+ patch messages:
+ msg227292
|
2014-09-22 18:25:02 | scoder | set | nosy:
+ mark.dickinson messages:
+ msg227290
|
2014-09-22 17:37:14 | scoder | create | |