classification
Title: Fractions instantiation revisited
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Eric.Wieser, josh.r, mark.dickinson, rhettinger, scoder, serhiy.storchaka, steven.daprano, wolma
Priority: normal Keywords: patch

Created on 2016-11-16 18:06 by wolma, last changed 2018-10-07 14:51 by steven.daprano.

Files
File name Uploaded Description Edit
fractions.patch wolma, 2016-11-16 18:06 review
Messages (12)
msg280977 - (view) Author: Wolfgang Maier (wolma) * Date: 2016-11-16 18:06
I've spent a bit of time lately trying to optimize the instantiation of Fractions. This is related to Issue22464, but instead of focusing on constructing Fractions from ints, my attempts revolve around improving instantiation from strings, floats and decimals.
I'm attaching a patch with all my changes for discussion, but here's an overview of what's in it:

CHANGES TO INSTANTIATION FROM STRINGS:

- isinstance checking for str is performed before the more expensive check for numbers.Rational (which has to go through abc)

- instantiation from strings doesn't use a regex pattern anymore for parsing; this is faster in many cases (and never slower) than the current version

- while reimplementing string parsing I added PEP 515 support (this is the only behavior change in the patch)

combined this gives me the following performance changes for instantiation of Fractions from different arguments (profiled with 1,000,000 random inputs each):

'x/y'-type of strings:
----------------------
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
old:
  1000000   12.162    0.000   27.778    0.000 fractions.py:84(__new__)
new:
  1000000    9.619    0.000   12.428    0.000 frc.py:68(__new__)


scientific notation (e.g., '1.234E-7'):
---------------------------------------
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
old:
  1000000   18.209    0.000   37.341    0.000 fractions.py:84(__new__)
new:
  1000000   15.509    0.000   21.252    0.000 frc.py:68(__new__)


integer strings (e.g. '123456'):
--------------------------------
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
old:
  1000000   11.272    0.000   26.201    0.000 fractions.py:84(__new__)
new:
  1000000    9.347    0.000   12.425    0.000 frc.py:68(__new__)


from other Fractions (to test effect on instantiation of numbers.Rational):
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
old:
  1000000    4.708    0.000   10.093    0.000 fractions.py:84(__new__)
new:
  1000000    4.835    0.000   10.572    0.000 frc.py:68(__new__)

from int subclass (as another numbers.Rational):
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
old:
  1000000    3.446    0.000    8.044    0.000 fractions.py:84(__new__)
new:
  1000000    3.795    0.000    8.836    0.000 frc.py:68(__new__)


SUMMARY of this part
====================

+ very significant speedup for instatiation from strings of different forms with (near) negligible effects on instantiation from numbers.Rational.

+ correct parsing of PEP 515-like number strings

- possibly slower error bubbling with invalid input (untested)


CHANGES TO INSTANTIATION FROM FLOAT AND DECIMAL:

- no explicit check for float and decimal in standard constructor; instead, simply try to call as_integer_ratio as last resort (even after checking for numbers.Rational)

- speed up alternate from_float and from_decimal constructor class methods by rearranging type checks and making use of _normalize flag

- import decimal only on demand (when using from_decimal)


Resulting performance changes:

standard constructor with float:
--------------------------------
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
old:
  1000000    4.362    0.000   12.452    0.000 fractions.py:84(__new__)
new:
  1000000    6.693    0.000   16.020    0.000 frc.py:68(__new__)

from_float:
-----------
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
old:
  1000000    3.146    0.000   17.769    0.000 fractions.py:193(from_float)
new:
  1000000    2.544    0.000    7.591    0.000 frc.py:205(from_float)

standard constructor with decimal:
--------------------------------
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
old:
  1000000    4.672    0.000   20.795    0.000 fractions.py:84(__new__)
new:
  1000000    7.097    0.000   24.526    0.000 frc.py:68(__new__)

from_decimal:
-------------
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
old:
  1000000    5.054    0.000   34.473    0.000 fractions.py:207(from_decimal)
new:
  1000000    2.942    0.000   16.013    0.000 frc.py:220(from_decimal)

SUMMARY of this part:
- standard constructor becomes a bit slower for floats and Decimals
specialized class methods become a lot faster (for Decimal the benchmarks are based on _pydecimal - with the C implementation the percent differences would be even larger)
- eliminates decimal from regularly imported modules
- allows Fraction instantiation from duck-typing classes that provide as_integer_ratio

I hope at least some of this is interesting.
msg280981 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-16 19:55
Profiling give you only approximate results. In normal execution the effect of your changes can be opposite. Could you please provide benchmarks without using profiling?
msg280995 - (view) Author: Wolfgang Maier (wolma) * Date: 2016-11-16 21:43
sure, I just happened to have the profiling available since I used it to optimize things. Here's similar microbenchmarks using perf:

STRINGS
=======

$ python -m perf timeit -s "from fractions import Fraction" "Fraction('1.23456e-7')"
.....................
Median +- std dev: 17.0 us +- 0.3 us

$ python -m perf timeit -s "from frc import Fraction" "Fraction('1.23456e-7')"
.....................
Median +- std dev: 8.95 us +- 0.16 us


$ python -m perf timeit -s "from fractions import Fraction" "Fraction('234/567')"
.....................
Median +- std dev: 12.6 us +- 0.1 us

$ python -m perf timeit -s "from frc import Fraction" "Fraction('234/567')"
.....................
Median +- std dev: 5.45 us +- 0.16 us


$ python -m perf timeit -s "from fractions import Fraction" "Fraction('123456')"
.....................
Median +- std dev: 12.4 us +- 0.6 us

$ python -m perf timeit -s "from frc import Fraction" "Fraction('123456')"
.....................
Median +- std dev: 5.77 us +- 0.12 us


$ python -m perf timeit -s "from fractions import Fraction; f=Fraction(3/4)" "Fraction(f)"
.....................
Median +- std dev: 4.36 us +- 0.06 us

$ python -m perf timeit -s "from frc import Fraction; f=Fraction(3/4)" "Fraction(f)"
.....................
Median +- std dev: 4.59 us +- 0.07 us


$ python -m perf timeit -s "from fractions import Fraction" -s "class myInt(int): pass" -s "i=myInt(123456)" "Fraction(i)"
.....................
Median +- std dev: 4.04 us +- 0.07 us

$ python -m perf timeit -s "from frc import Fraction" -s "class myInt(int): pass" -s "i=myInt(123456)" "Fraction(i)"
.....................
Median +- std dev: 4.27 us +- 0.06 us


FLOATS
======

$ python -m perf timeit -s "from fractions import Fraction" "Fraction(1.23456e-7)"
.....................
Median +- std dev: 6.30 us +- 0.28 us

$ python -m perf timeit -s "from frc import Fraction" "Fraction(1.23456e-7)"
.....................
Median +- std dev: 8.64 us +- 0.13 us


$ python -m perf timeit -s "from fractions import Fraction" "Fraction.from_float(1.23456e-7)"
.....................
Median +- std dev: 8.68 us +- 0.14 us

$ python -m perf timeit -s "from frc import Fraction" "Fraction.from_float(1.23456e-7)"
.....................
Median +- std dev: 3.37 us +- 0.17 us


DECIMALS (using C implementation this time)
===========================================

$ python -m perf timeit -s "from fractions import Fraction; from decimal import Decimal; d=Decimal('123456')" "Fraction(d)".....................
Median +- std dev: 6.95 us +- 0.21 us

$ python -m perf timeit -s "from frc import Fraction; from decimal import Decimal; d=Decimal('123456')" "Fraction(d)"
.....................
Median +- std dev: 8.43 us +- 0.17 us


$ python -m perf timeit -s "from fractions import Fraction; from decimal import Decimal; d=Decimal('123456')" "Fraction.from_decimal(d)"
.....................
Median +- std dev: 11.6 us +- 0.2 us

$ python -m perf timeit -s "from frc import Fraction; from decimal import Decimal; d=Decimal('123456')" "Fraction.from_decimal(d)"
.....................
Median +- std dev: 4.14 us +- 0.28 us
msg280997 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2016-11-16 22:03
Might it make sense to make instantiation from numbers.Rational duck typing based as well? Just try to get the numerator and denominator attributes, on AttributeError, skip it and move on. Unless there is some concern that a non-Rational type might have both attributes and not intend them to mean it's a Rational?
msg280998 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2016-11-16 22:06
Similarly, type checking for int might be replaced with calling operator.index on the input and handling the TypeError; that way, anything that has declared itself logically int-like is handled without explicit type checking at all.
msg280999 - (view) Author: Wolfgang Maier (wolma) * Date: 2016-11-16 22:20
No, I don't think the numeric tower ABC should be replaced by duck-typing. One of the very reasons the fractions module exists is that it showcases how to use the numeric tower. If you want a class to be picked up as a Rational it should be registered as such. Pretty much the same goes for integer-like things. The direct type check for ints only exists to speed up the most obvious case, but anything integer-like should be registered as a numbers.Integral and it will just work with fractions. Beautiful stuff!

This is not the case for as_integer_ratio because that method is not defined in the numeric tower and this is why I think duck-typing could be of interest here (I'm not sure though whether its worth the somewhat decreased performance). My reasoning is that the standard constructor is anyway slower for floats and ints than the specialized classmethods for the two so people who really care about speed here (probably not too many) can just use the classmethods.
msg305611 - (view) Author: Eric Wieser (Eric.Wieser) * Date: 2017-11-05 23:21
> allows Fraction instantiation from duck-typing classes that provide as_integer_ratio

This would allow the numpy `np.floating` types to take part in `Fraction` conversion as well, which would be great.

As far as I can tell, `Decimal` already follows this duck-typing, so it would be a shame for other modules not to.

Perhaps an `num, den = operator.rational(x)` is needed to unify this code across the places that coerce rational numbers.
msg310470 - (view) Author: Stefan Behnel (scoder) * Date: 2018-01-23 05:55
Not sure if it's relevant for this specific change, but here's a benchmark that you could use for Fractions: issue22458
msg313208 - (view) Author: Stefan Behnel (scoder) * Date: 2018-03-04 10:57
Just FYI and as further motivation, I reimplemented this dedicated parser for quicktions (in Cython, so the timings and speedups are not comparable).

https://github.com/scoder/quicktions/commit/cc034e07325ec492decdb7b1bcca69246cc780fd

I was able to get another quite visible improvement by caching the values of "10 ** shift" for 0 <= shift < 58 in a tuple. Higher values are very unlikely in practice, and the memory size of a tuple with 58 values gives a nice multiple of the usual cache line size. (I originally stored 64 values in my commit but then cut it down later.)

https://github.com/scoder/quicktions/commit/c20add53dc4936d70eb0daa370946a600adddca9

I suspect that the difference won't be as big for the Python implementation, but it still seems worth a try.

The overall speedup that I got, compared to the initial regex implementation, is 50-70%.

[regex]  $ python3.7 -m timeit -s 'from quicktions import Fraction as F' 'F("153456/789344")'
200000 loops, best of 5: 1.19 usec per loop
[native] $ python3.7 -m timeit -s 'from quicktions import Fraction as F' 'F("153456/789344")'
500000 loops, best of 5: 593 nsec per loop

[regex]  $ python3.7 -m timeit -s 'from quicktions import Fraction as F' 'F("15.3456789E+4")'
100000 loops, best of 5: 2.3 usec per loop
[native] $ python3.7 -m timeit -s 'from quicktions import Fraction as F' 'F("15.3456789E+4")'
500000 loops, best of 5: 667 nsec per loop

It could be even higher if I additionally moved the int() integer parsing into Cython. Might try that at some point. But that's also not something that concerns the implementation in CPython.
msg313741 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-03-13 11:14
[Eric Wieser]

> This would allow the numpy `np.floating` types to take part in `Fraction` conversion as well, which would be great.

I'm confused: as far as I can tell, the NumPy floating-point types don't implement `as_integer_ratio`. (Except for `np.float64`, which subclasses `float` and so inherits `is_integer_ratio` from it.) Or are you suggesting that _if_ they were to implement `as_integer_ratio` at some point in the future, then they'd become compatible with `Fraction`?
msg313753 - (view) Author: Eric Wieser (Eric.Wieser) * Date: 2018-03-13 14:44
> Are you suggesting that _if_ they were to implement `as_integer_ratio` at some point in the future, then they'd become compatible with `Fraction`

Yes, exactly. Conversely, there's little gain in implementing it _until_ `Fraction` supports calling it.
msg313786 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-03-13 21:54
FYI, adding int.as_integer_ratio() is being taken care in a separate issue:  https://bugs.python.org/issue33073 . I've set it aside for a beginning core-dev mentee to work on because it is simple and self-contained.  Pretty much all the related work can be done here.
History
Date User Action Args
2018-10-07 14:51:24steven.dapranosetnosy: + steven.daprano
2018-03-13 21:54:21rhettingersetnosy: + rhettinger
messages: + msg313786
2018-03-13 14:44:28Eric.Wiesersetmessages: + msg313753
2018-03-13 11:14:31mark.dickinsonsetmessages: + msg313741
2018-03-04 10:57:02scodersetmessages: + msg313208
versions: + Python 3.8, - Python 3.7
2018-01-23 05:55:11scodersetnosy: + scoder
messages: + msg310470
2017-11-05 23:21:38Eric.Wiesersetnosy: + Eric.Wieser
messages: + msg305611
2016-11-16 22:20:31wolmasetmessages: + msg280999
2016-11-16 22:06:03josh.rsetmessages: + msg280998
2016-11-16 22:03:05josh.rsetnosy: + josh.r
messages: + msg280997
2016-11-16 21:43:11wolmasetmessages: + msg280995
2016-11-16 19:55:54serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg280981
2016-11-16 18:50:17r.david.murraysetnosy: + mark.dickinson
2016-11-16 18:06:27wolmacreate