New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize rational arithmetics #87586
Comments
fractions.py uses naive algorithms for doing arithmetics. It may worth implementing less trivial versions for addtion/substraction Some projects (e.g. SymPy here: sympy/sympy#12656) reinvent |
Here's some code to try out: from math import gcd
from fractions import Fraction
import operator
import math
class Henrici(Fraction):
'Reformulate _mul to reduce the size of intermediate products'
def _mul(a, b):
a_n, a_d = a.numerator, a.denominator
b_n, b_d = b.numerator, b.denominator
d1 = math.gcd(a_n, b_d)
a_n //= d1
b_d //= d1
d2 = math.gcd(b_n, a_d)
b_n //= d2
a_d //= d2
result = Fraction(a_n * b_n, a_d * b_d, _normalize=False)
assert math.gcd(a_n * b_n, a_d * b_d) == 1 and a_d * b_d >= 0
return result
assert Henrici(10, 3) * Henrici(6, 5) == Henrici(4, 1) |
Note that Fraction arithmetic has a huge administrative overhead. The cost of the underlying multiplications and divisions won't dominate the total time until the numerators and denominators are very large. For the proposed optimization, this implies that cost for the extra Python steps to implement the optimization will be negligible. The benefits of the optimization are similarly attenuated. -- Update to experimentation code: add guards for the relatively prime case. -- class Henrici(Fraction):
'Reformulate _mul to reduce the size of intermediate products'
def _mul(a, b):
a_n, a_d = a.numerator, a.denominator
b_n, b_d = b.numerator, b.denominator
d1 = math.gcd(a_n, b_d)
if d1 > 1:
a_n //= d1
b_d //= d1
d2 = math.gcd(b_n, a_d)
if d2 > 1:
b_n //= d2
a_d //= d2
result = Fraction(a_n * b_n, a_d * b_d, _normalize=False)
assert math.gcd(a_n * b_n, a_d * b_d) == 1 and a_d * b_d >= 0
return result |
The cost to the common case for small components is about 20%: $ python3.10 -m timeit -r11 -s 'from fractions import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a * b'
200000 loops, best of 11: 1.8 usec per loop
$ python3.10 -m timeit -r11 -s 'from patched import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a * b'
100000 loops, best of 11: 2.14 usec per loop |
Without the guards the incremental cost drops to 10%. $ python3.10 -m timeit -r11 -s 'from patched_noguards import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a * b'
100000 loops, best of 11: 2.02 usec per loop |
I have similar timings (for a draft version of PR, see f.patch) as for the last comment, though the small-dens overhead seems to be bigger(~20%): On another hand, here are timings for bigger denominators: It's not so clear what "are very large" does mean, that could be defined here. BTW, 10**6 denominators are (very!) small for mentioned above use case (CAS package). |
Looks worth it :-) |
Personally, I'd prefer to keep the code simplicity, and the speed for small inputs here. Python's needs aren't the same as SymPy's needs or SAGE's needs, and not all of the fractions.Fraction use-cases involve summing lots of values with incompatible denominators.
Could you give some idea of the crossover point for a single addition? At roughly what size numerator/denominator do we start seeing a performance benefit? |
It's not going to be complicated so much and algorithms are well known,
Speed loss is not so big, 10-20%.
So, for which needs it will serve? Sorry, I can't suggest an application, which does use builtin There is one exception I've found: stdlib's statistics module uses Fraction's
No need for a lots of values (i.e. 1000): denominator of the sum will grow very fast, that
$ ./python -m timeit -r11 -s 'from fractions import Fraction as F' -s 'a=F(10,31011021112)' -s 'b=F(86,11011021115)' 'a + b'
20000 loops, best of 11: 12.4 usec per loop
$ ./python -m timeit -r11 -s 'from patched import Fraction as F' -s 'a=F(10,31011021112)' -s 'b=F(86,11011021115)' 'a + b'
20000 loops, best of 11: 12.5 usec per loop |
That's an interesting example: it does indeed satisfy the "sum of a lot of values" part, but not the "incompatible denominators" part. :-) The typical use there is that those fractions have been converted from floats, and so all denominators will be a smallish power of two. So we don't encounter the same coefficient explosion problem that we do when summing fractions with unrelated denominators. I have many similar use-cases in my own code, where numerators and denominators don't tend to ever get beyond a few hundred digits, and usually not more than tens of digits. Thanks for the timings! So assuming that wasn't a specially-chosen best case example, the crossover is somewhere around numerators and denominators of ten digits or so.
For me, the big difference is that the current code is obviously correct at a glance, while the proposed code takes study and thought to make sure that no corner cases are missed. Shrug. Put me down as -0. |
I'm surprised to hear that the "typical use-case" of Fraction is fractions converted from floats. Do you have evidence in the wild to support that? I would expect any application that uses fractions "generically" to run into the same sorts of problems SymPy does. The issue is that the sum or product of two unrelated fractions has a denominator that is ~ the product of the denominators of each term. So they tend to grow large, unless there is some structure in the terms that results in lots of cancellation. That's why real world numeric typically doesn't use exact arithmetic, but there are legitimate use-cases for it (computer algebra being one). This actually also applies even if the denominators are powers of 2. That's why arbitrary precision floating point numbers like Decimal or mpmath.mpf limit the precision, or effectively, the power of 2 in the denominator. By the way, the "algorithm" here really isn't that complicated. I didn't even realize it had a name. The idea is that for a/b * c/d, if a/b and c/d are already in lowest terms, then the only cancellation that can happen is from a/d or from c/b. So instead of computing gcd(a*c, b*d), we only compute gcd(a, d) and gcd(c, b) and cancel them off the corresponding terms. It turns out to be faster to take two gcds of smaller numbers than one gcd of big ones. The algorithm for addition is a bit more complicated, at least to see that it is correct, but is still not that bad (the paper linked in the OP explains it clearly in one paragraph). It's far less complicated than, for example, Lehmer's gcd algorithm (which is implemented in math.gcd). |
On Sun, Mar 07, 2021 at 10:34:24PM +0000, Aaron Meurer wrote:
For statistics module - that may be true. Unfortunately, no
Rather "algorithms"; everything is there in the II-nd volume of
Or Karatsuba multiplication. BTW, low-denominators performance may be restored (at least |
On Sun, Mar 07, 2021 at 12:16:36PM +0000, Mark Dickinson wrote:
But there is no limits to use Fraction's for input, e.g. there are
No, but this will handle the first branch.
That may be fixed by keeping relevant references right in the code, not |
I agree with everyone ;-) That is, my _experience_ matches Mark's: as a more-or-less "numeric expert", I use Fraction in cases where it's already fast enough. Python isn't a CAS, and, e.g., in pure Python I'm not doing things like computing or composing power series with rational coefficients (but routinely do such stuff in a CAS). It's usually just combinatorial probabilities in relatively simple settings, and small-scale computations where float precision would be fine except I don't want to bother doing error analysis first to ensure things like catastrophic cancellation can't occur. On the other hand, the proposed changes are bog standard optimizations for implementations of rationals, and can pay off very handsomely at relatively modest argument sizes. So I'm +0. I don't really mind the slowdown for small arguments because, as above, I just don't use Fraction for extensive computation. But the other side of that is that I won't profit much from optimizations either, and while the optimizations for * and / are close to self-evident, those for + and - are much subtler. Code obscurity imposes ongoing costs of its own. WRT which, I added Python's Karatsuba implementation and regret doing so. I don't want to take it out now (it's already in ;-) ), but it added quite a pile of delicate code to what _was_ a much easier module to grasp. People who need fast multiplication are still far better off using gmpy2 anyway (or fiddling Decimal precision - Stefan Krah implemented "advanced" multiplication schemes for that module). |
On Tue, Mar 09, 2021 at 05:11:09AM +0000, Tim Peters wrote:
In fact, these optimizations will payoff faster (wrt denominators Sorry for off-topic:
(And was much more useless, even as a fallback. But in the end - I agreed, you can't outperform professional bigint
(Another strange python "feature", IMHO. Why the custom bigint |
bpo-21922 lists several concerns, and best I know they all still apply. As a practical matter, I expect the vast bulk of core Python developers would reject a change that sped large int basic arithmetic by a factor of a billion if it slowed down basic arithmetic on native machine-size ints by even half a percent. |
A possible resolution to this issue might be augmenting https://docs.python.org/3/library/fractions.html#module-fractions |
Terry, we could do that, but the opposition here isn't strong, and is pretty baffling anyway ;-) : the suggested changes are utterly ordinary for implementations of rationals, require very little code, are not delicate, and are actually straightforward to see are correct (although unfamiliarity can be an initial barrier - e.g., if you don't already know that after g = gcd(a, b)
a1 = a // g
b1 = b // g it's necessarily true that a1 and b1 are coprime, a key reason for way the transformation is correct will be lost on you - but it's also very easy to prove that claim once you're told that it is a key here). The OP is right that "we" (at least Mark, and Raymond, and I) have routinely checked in far subtler optimizations in various areas. |
There are no alternative implementations. SymPy's PythonRational (AFAIR, this class not in the default namespace) is an internal fallback solution, for case when no gmpy2 is available. Maybe we should list gmpy2 everywhere people expect fast bigint/rationals (i.e. int docs, math module, Fraction and so on), so they will not not be disappointed...
Much more complex (IMHO) code was accepted (there were examples for C, but the limit_denominator() method - an example for Python code, from the same module!). In fact, it pretty obvious that output fractions are equal to the middle-school versions. Non-trivial part may be why they are normalized. I hope, now this covered by comments. |
Thanks, all! This has been merged now. If someone wants to continue pursuing things left hanging, I'd suggest opening a different BPO report. |
On Mon, Mar 22, 2021 at 02:35:59AM +0000, Tim Peters wrote:
Tim, if you are about micro-optimizations for small components Thanks for all reviewers and commenters. |
If experience is any guide, nothing about anything here will go smoothly ;-) For example, setting up a module global Or wrt changing properties to private attributes, that speeds some things but slows others - and, unless I missed it, nobody who wrote that code to begin with said a word about why it was done that way. I'm not going to "pre-bless" shortcuts in an area where everything so far has been more contentious than it "should have been" (to my eyes). Opening a BPO report is a trivial effort. Skipping NEWS, or not, depends on how it goes. |
On Mon, Mar 22, 2021 at 04:34:32AM +0000, Tim Peters wrote:
Looking on the stdlib, I would just import gcd.
... and less readable. Not sure if speedup will be noticeable. But more important is
Yes, I'll dig into the history. Commen^WDocstring doesn't explain
Sure, but such report will require patch to be |
This report is closed. Please open a different report. We've already demonstrated that, as predicted, nothing can be said here without it being taken as invitation to open-ended discussion. So it goes, but it doesn't belong on _this_ report anymore. |
Just FYI, I applied the same changes to the quicktions [1] module, a Cython compiled (and optimised) version of fractions.Fraction. [1] https://github.com/scoder/quicktions/ The loss in performance for small values is much higher there, almost 2x for the example given (compared to 10-20% for CPython): $ python3.8 -m timeit -r11 -s 'from fractions import Fraction as F' -s 'a=F(10,3)' -s 'b=F(6, 5)' 'a * b'
100000 loops, best of 11: 1.94 usec per loop Original: Patched: For the larger values example, the gain is tremendous, OTOH: $ python3.8 -m timeit -r11 -s 'from fractions import Fraction as F' -s 'import random' -s 'n = [random.randint(1, 1000000) for _ in range(1000)]' -s 'd = [random.randint(1, 1000000) for _ in range(1000)]' -s 'a=list(map(lambda x: F(*x), zip(n, d)))' 'sum(a)'
2 loops, best of 11: 150 msec per loop Original: Patched: I'll have to see if the slowdown can be mitigated somehow. Interesting enough, the telco benchmark seems to benefit slightly from this: Original: Patched: |
Yes, for small components this is a known slowdown. I'm trying to mitigate that case in #25518. Except for the mixed mode (Fraction's+int) - this match the original performance. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: