classification
Title: Speed up fractions implementation
Type: performance Stage:
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: georg.brandl, mark.dickinson, python-dev, rhettinger, scoder, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-09-22 17:37 by scoder, last changed 2014-09-26 08:30 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
special_case_int.patch scoder, 2014-09-22 18:42 special case "int" values in the constructor
special_case_int2.patch scoder, 2014-09-22 18:58 special case only exact "int" values in the constructor
special_case_int3.patch scoder, 2014-09-22 19:10 same as before with a little twist in type check
special_case_int4.patch scoder, 2014-09-22 19:35 same as before + faster "f == intval"
avoid_redundant_property_lookups.patch scoder, 2014-09-22 21:11
Messages (18)
msg227284 - (view) Author: Stefan Behnel (scoder) * Date: 2014-09-22 17:37
Fractions are an excellent way to do exact money calculations and largely beat Decimal in terms of simplicity, accuracy and safety. Clearly not in terms of speed, though.

The current implementation does some heavy type checking and dispatching in __new__() and a simplistic gcd based normalisation. Here is a profiling run from the benchmark proposed in issue 22458 (which matches more or less the results with my own code):

         6969671 function calls (6969278 primitive calls) in 4.835 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   519644    1.324    0.000    2.928    0.000 fractions.py:73(__new__)
   519632    0.654    0.000    0.654    0.000 fractions.py:17(gcd)
   319744    0.637    0.000    2.694    0.000 fractions.py:400(_add)
  1039260    0.507    0.000    0.950    0.000 abc.py:178(__instancecheck__)
        4    0.459    0.115    4.821    1.205 bm_telco_fractions.py:38(run)
  1039300    0.443    0.000    0.443    0.000 _weakrefset.py:70(__contains__)
   519616    0.301    0.000    4.329    0.000 fractions.py:373(forward)
   199872    0.272    0.000    1.335    0.000 fractions.py:416(_mul)
  1598720    0.117    0.000    0.117    0.000 fractions.py:278(denominator)
   959232    0.074    0.000    0.074    0.000 fractions.py:274(numerator)

The instantiation itself takes twice as much time as the gcd calculations, and both dominate the overall runtime of the benchmark by about 60%. Improving the instantiation time would thus bring a substantial benefit.
msg227290 - (view) Author: Stefan Behnel (scoder) * Date: 2014-09-22 18:25
Adding Mark Dickinson to the noisy list who mentioned having worked on a C implementation for gcd(). I think this would be a good thing to try. However, the most important part would be to restructure and specialise Fraction.__new__().
msg227292 - (view) Author: Stefan Behnel (scoder) * Date: 2014-09-22 18:42
Here is a straight forward patch that special cases int values in the constructor. It gives me a 35% speedup in the benchmark.
msg227294 - (view) Author: Stefan Behnel (scoder) * Date: 2014-09-22 18:58
Updated patch - there is a somewhat hidden attempt in the code to keep nominator and denominater plain int values, not subtypes. That means that it's safer to restrict the optimisation to plain ints as well, which should still hit >95% of the use cases.
msg227295 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-22 19:19
What speedup give you second change?
msg227298 - (view) Author: Stefan Behnel (scoder) * Date: 2014-09-22 19:23
The second isn't a big difference because it only hits plain instantiations from integers. They are less likely to be performance critical than those from a quotient, which happen for all calculations.

It's more for symmetry, I guess.
msg227299 - (view) Author: Stefan Behnel (scoder) * Date: 2014-09-22 19:35
I found one more place where special casing helps: equal comparisons to integers, e.g. f == 0 or f == 1 or so. timeit shows me a speedup by a factor of three for this, with only a tiny slow-down when comparing fractions on both sides.

I put all of them in one patch, deselecting after applying should be easy enough.
msg227300 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-22 19:37
Microbenchmarks show about 2x speedup in both cases.

The patch LGTM.
msg227302 - (view) Author: Stefan Behnel (scoder) * Date: 2014-09-22 19:46
Benchmark profile after the patch:

         5930670 function calls (5930288 primitive calls) in 3.748 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   519632    0.828    0.000    0.828    0.000 fractions.py:17(gcd)
   519644    0.806    0.000    1.700    0.000 fractions.py:73(__new__)
   319744    0.630    0.000    1.983    0.000 fractions.py:408(_add)
        4    0.393    0.098    3.735    0.934 bm_telco_fractions.py:38(run)
   519616    0.277    0.000    3.316    0.000 fractions.py:381(forward)
   199872    0.275    0.000    0.901    0.000 fractions.py:424(_mul)
  1598720    0.161    0.000    0.161    0.000 fractions.py:285(denominator)
   520257    0.155    0.000    0.155    0.000 {built-in method isinstance}
   959232    0.118    0.000    0.118    0.000 fractions.py:281(numerator)
   519657    0.066    0.000    0.066    0.000 {built-in method __new__ of type object at 0x9d1c40}

Note that the gcd() call inside of __new__() now takes about as much time as the rest of the __new__() itself. It's clearly still worth optimising that.
msg227308 - (view) Author: Stefan Behnel (scoder) * Date: 2014-09-22 21:11
Here is another little optimisation that removes the redundant property lookups for the denominator in __add__() and __sub__().

New profile:

         5291182 function calls (5290800 primitive calls) in 3.596 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   519632    0.843    0.000    0.843    0.000 fractions.py:17(gcd)
   519644    0.800    0.000    1.709    0.000 fractions.py:73(__new__)
   319744    0.520    0.000    1.816    0.000 fractions.py:408(_add)
        4    0.401    0.100    3.582    0.896 bm_telco_fractions.py:38(run)
   519616    0.291    0.000    3.156    0.000 fractions.py:381(forward)
   199872    0.274    0.000    0.904    0.000 fractions.py:424(_mul)
   520257    0.145    0.000    0.145    0.000 {built-in method isinstance}
   959232    0.108    0.000    0.108    0.000 fractions.py:285(denominator)
   959232    0.108    0.000    0.108    0.000 fractions.py:281(numerator)
   519657    0.066    0.000    0.066    0.000 {built-in method __new__ of type object at 0x9d1c40}
msg227395 - (view) Author: Stefan Behnel (scoder) * Date: 2014-09-23 22:09
This simple Cython variant of gcd() is substantially faster for me (you may consider it pseudo-code for a C implementation):

def _gcd(a, b):
    # Try doing all computation in C space.  If the numbers are too large
    # at the beginning, retry until they are small enough.
    cdef long long ai, bi
    while b:
        try:
            ai, bi = a, b
        except OverflowError:
            pass
        else:
            # switch to C loop
            while bi:
                ai, bi = bi, ai%bi
            return ai
        a, b = b, a%b
    return a

It tries to drop the two numbers into C long-long-ints as soon as possible, and only runs the calculation in Python space until they are small enough to fit. In most cases, this will be either right from the start or fairly quickly after only a few iterations.
msg227406 - (view) Author: Stefan Behnel (scoder) * Date: 2014-09-24 06:29
BTW, the last two patches (int4 and redundant_property) are ready to be applied. Anyone?
msg227407 - (view) Author: Roundup Robot (python-dev) Date: 2014-09-24 06:40
New changeset 646bc7d3544b by Georg Brandl in branch 'default':
#22464: Speed up common Fraction operations by special-casing several
https://hg.python.org/cpython/rev/646bc7d3544b
msg227408 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2014-09-24 06:41
Left this open in case you have additional patches coming.
msg227488 - (view) Author: Stefan Behnel (scoder) * Date: 2014-09-24 20:51
I created issue 22486 about the gcd() performance. I think we can close this ticket - I don't see any more obvious low hanging fruit and future findings can have their own ticket.

Out of interest, I modified the fractions module to compile Fraction into an extension type with Cython. For the benchmark, it currently gives me a 10x speedup over the original fractions module in Py3.4. I uploaded my initial attempt to PyPI and it's on github:

https://pypi.python.org/pypi/quicktions

https://github.com/scoder/quicktions

That makes a C implementation of Fraction generally worth considering for CPython.
msg227576 - (view) Author: Stefan Behnel (scoder) * Date: 2014-09-25 21:06
Sorry for reopening this, but I found one more thing. Division is pretty heavy on PyLong objects and there doesn't seem to be an internal optimisation for division by 1 and -1. Since many Fraction input values can already be normalised for some reason, the following change shaves off almost 30% of the calls to PyNumber_InPlaceFloorDivide() in the telco benchmark during Fraction instantiation according to callgrind, thus saving 20% of the CPU instructions that go into tp_new().

Instead of always executing

            numerator //= g
            denominator //= g

this avoids unnecessary divisions:

            if g is not 1:
                if g is -1:
                    numerator = -numerator
                    denominator = -denominator
                else:
                    numerator //= g
                    denominator //= g

Using the "is" operator here is CPythonish, but doing the same with != and == should also be an improvement already, although not as cheap. Now, given that CPython caches small integers internally, I would suggest actually not adding these two special cases to the fractions module but more generally to the division routines in longobject.c.
msg227591 - (view) Author: Stefan Behnel (scoder) * Date: 2014-09-26 07:53
I tried it, but it seems better to open a new ticket for this as there are behavioural changes. See #22501.
msg227593 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-26 08:30
Please do not use "is" for number comparison. This can be broken unexpectedly in future or on alternative implementation.
History
Date User Action Args
2014-09-26 08:30:22serhiy.storchakasetmessages: + msg227593
2014-09-26 07:53:10scodersetstatus: open -> closed

messages: + msg227591
2014-09-25 21:06:12scodersetstatus: closed -> open

messages: + msg227576
2014-09-24 20:51:54scodersetstatus: open -> closed
resolution: fixed
messages: + msg227488
2014-09-24 06:41:11georg.brandlsetnosy: + georg.brandl
messages: + msg227408
2014-09-24 06:40:46python-devsetnosy: + python-dev
messages: + msg227407
2014-09-24 06:29:23scodersetmessages: + msg227406
2014-09-23 22:09:12scodersetmessages: + msg227395
2014-09-23 06:46:35rhettingersetnosy: + rhettinger
2014-09-22 21:11:45scodersetfiles: + avoid_redundant_property_lookups.patch

messages: + msg227308
2014-09-22 19:46:55scodersetmessages: + msg227302
2014-09-22 19:37:01serhiy.storchakasetmessages: + msg227300
2014-09-22 19:35:53scodersetfiles: + special_case_int4.patch

messages: + msg227299
2014-09-22 19:23:27scodersetmessages: + msg227298
2014-09-22 19:19:19serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg227295
2014-09-22 19:10:49scodersetfiles: + special_case_int3.patch
2014-09-22 18:58:34scodersetfiles: + special_case_int2.patch

messages: + msg227294
2014-09-22 18:45:44scodersettype: performance
2014-09-22 18:42:48scodersetfiles: + special_case_int.patch
keywords: + patch
messages: + msg227292
2014-09-22 18:25:02scodersetnosy: + mark.dickinson
messages: + msg227290
2014-09-22 17:37:14scodercreate