classification
Title: Speed up mod/divmod/floordiv for Fraction type
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: josh.r, mark.dickinson, scoder, serhiy.storchaka
Priority: normal Keywords: patch, patch, patch

Created on 2018-12-26 09:39 by scoder, last changed 2019-01-02 12:42 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 11322 merged scoder, 2018-12-26 09:48
PR 11322 merged scoder, 2018-12-26 09:48
PR 11322 merged scoder, 2018-12-26 09:48
Messages (12)
msg332533 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2018-12-26 09:39
Spelling out the numerator/denominator calculation in the __mod__ special method, and actually implementing __divmod__, speeds up both operations by 2-3x. This is due to avoiding repeated Fraction instantiation and normalisation, as well as less arithmetic operations.

$ ./python -m timeit -s 'from fractions import Fraction as F; a = F(-7, 3); b = F(3, 2)' 'a%b'
50000 loops, best of 5: 9.53 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a = F(-7, 3); b = F(3, 2)' 'a%3'
50000 loops, best of 5: 6.61 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a = F(-7, 3); b = F(3, 2)' 'divmod(a, b)'
20000 loops, best of 5: 14.1 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a = F(-7, 3); b = F(3, 2)' 'divmod(a, 3)'
20000 loops, best of 5: 10.2 usec per loop

$ ./python -m timeit -s 'from fractions import Fraction as F; a = F(-7, 3); b = F(3, 2)' 'a%b'          
100000 loops, best of 5: 2.96 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a = F(-7, 3); b = F(3, 2)' 'a%3'
100000 loops, best of 5: 2.78 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a = F(-7, 3); b = F(3, 2)' 'divmod(a, b)'
100000 loops, best of 5: 3.93 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a = F(-7, 3); b = F(3, 2)' 'divmod(a, 3)'
50000 loops, best of 5: 3.82 usec per loop
msg332537 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2018-12-26 12:30
Similarly, I think "//" (__floordiv__) should be implemented using integer operations rather than math.floor():

    (a.numerator * b.denominator) // (b.numerator * a.denominator)

Thoughts?
msg332538 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2018-12-26 12:44
Motivation for the latter:

$ ./python -m timeit -s 'from fractions import Fraction as F; a = F(-7, 3); b = F(3, 2)' 'a // b'
100000 loops, best of 5: 3.7 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a = F(-7, 3); b = F(3, 2)' 'a // 3'
100000 loops, best of 5: 3.49 usec per loop

$ ./python -m timeit -s 'from fractions import Fraction as F; a = F(-7, 3); b = F(3, 2)' 'a // b'
500000 loops, best of 5: 899 nsec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a = F(-7, 3); b = F(3, 2)' 'a // 3'
500000 loops, best of 5: 729 nsec per loop
msg332547 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-12-26 15:09
Please make several additional tests, and ensure that there is no regression.

1. Test with fractions with the same large denominator (for example 2**50, 2**100, 10**30, 3**50, factorial(30), or a large pseudo-primary number) and small and large numerators.

2. Test with fractions with the same large numerator (as above) and small and large denominators.

3. Test with fractions with random numerators and denominators and find worst cases (in which the optimization effect is the smallest or negative).
msg332553 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2018-12-26 16:01
Sure, I can add tests, but I wonder what kind of regression you expect. The algorithm is still the same as before, it's just implemented more efficiently. It does trade a bit of memory for the speed, though, since there is no longer an intermediate normalisation step, and therefore the integers can get larger during the calculation. Shouldn't make a big difference in practice, though. We are talking about bytes, not megabytes here.
msg332554 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-12-26 16:20
I want to check whether removing the normalization step has a negative performance effect larger than the time spent on normalization.
msg332557 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2018-12-26 18:36
Thanks for your review and ideas, Serhiy. I added a couple of test cases, but failed to find any case where the new implementation is not much faster.

I also tried "divmod(n_div, d_div)" for implementing __divmod__(), and the results are mixed, e.g.

Arithmetic operators:
$ ./python -m timeit -s 'from fractions import Fraction as F; a = F(-7, 3); b = F(3, 2)' 'divmod(a,b)'
100000 loops, best of 5: 3.11 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a, b = F(10**15+1, 10**27+1), F(10**9-1, 10**7-1)' 'divmod(a, b)'
100000 loops, best of 5: 3.48 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a, b = F(10**350+1, 10**207+1), F(10**89-1, 10**62-1)' 'divmod(a, b)'
20000 loops, best of 5: 17.7 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a, b = F(10**89-1, 10**611-1), F(10**350+1, 10**207+1)' 'divmod(a, b)'
20000 loops, best of 5: 18.2 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a, b = F(10**350+1, 10**207+1), F(10**89-1, 10**612-1)' 'divmod(a, b)'
5000 loops, best of 5: 34.4 usec per loop

divmod():
$ ./python -m timeit -s 'from fractions import Fraction as F; a = F(-7, 3); b = F(3, 2)' 'divmod(a,b)'
100000 loops, best of 5: 3.04 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a, b = F(10**15+1, 10**27+1), F(10**9-1, 10**7-1)' 'divmod(a, b)'
100000 loops, best of 5: 3.56 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a, b = F(10**350+1, 10**207+1), F(10**89-1, 10**62-1)' 'divmod(a, b)'
20000 loops, best of 5: 17.3 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a, b = F(10**89-1, 10**611-1), F(10**350+1, 10**207+1)' 'divmod(a, b)'
20000 loops, best of 5: 18.2 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a, b = F(10**350+1, 10**207+1), F(10**89-1, 10**612-1)' 'divmod(a, b)'
10000 loops, best of 5: 31.7 usec per loop

Current master, for comparison:
$ ./python -m timeit -s 'from fractions import Fraction as F; a = F(-7, 3); b = F(3, 2)' 'divmod(a,b)'
20000 loops, best of 5: 14.1 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a, b = F(10**15+1, 10**27+1), F(10**9-1, 10**7-1)' 'divmod(a, b)'
20000 loops, best of 5: 16 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a, b = F(10**350+1, 10**207+1), F(10**89-1, 10**62-1)' 'divmod(a, b)'
5000 loops, best of 5: 61.2 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a, b = F(10**89-1, 10**611-1), F(10**350+1, 10**207+1)' 'divmod(a, b)'
5000 loops, best of 5: 65.3 usec per loop
$ ./python -m timeit -s 'from fractions import Fraction as F; a, b = F(10**350+1, 10**207+1), F(10**89-1, 10**612-1)' 'divmod(a, b)'
2000 loops, best of 5: 120 usec per loop

Definitely not an obvious decision, although there is a tendency towards faster execution for very large numbers. Whether it's faster or slower would probably depend on the data and the application at hand.
I could live with either choice, but would use divmod() for now since it simplifies the implementation.
msg332659 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2018-12-28 14:10
divmod imposes higher fixed overhead in exchange for operating more efficiently on larger values.

Given the differences are small either way, and using divmod reduces scalability concerns for larger values (which are more likely to occur in code that delays normalization), I'd be inclined to stick with the simpler divmod-based implementation.
msg332669 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-12-28 17:57
Using divmod() makes the case of small integers 2-3% slower, but makes the case of large integers more significantly faster. And since the code with divmod() is simpler, I think it is worth to use it.

The Fraction class also serves educational goals. The simpler code is better. The proposed patch makes the code slightly more complex, but not too much. I think it's an affordable price for such degree of speed up.
msg332711 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-12-29 11:21
> since the code with divmod() is simpler, I think it is worth to use it.

+1 from me, and +1 on the PR in general.
msg332867 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-01-02 12:22
New changeset 3a374e0c5abe805667b71ffaaa7614781101ff4c by Serhiy Storchaka (Stefan Behnel) in branch 'master':
bpo-35588: Speed up mod, divmod and floordiv operations for Fraction type (#11322)
https://github.com/python/cpython/commit/3a374e0c5abe805667b71ffaaa7614781101ff4c
msg332868 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-01-02 12:42
Thank you for your contribution Stefan!

I am sorry for an awful commit message. I edited it, but due to some bug on GitHub the edited commit message was ignored after pressing the "Try merge" button. I heart this is not the first time of ignoring the edited commit message.
History
Date User Action Args
2019-01-02 12:42:03serhiy.storchakasetstatus: open -> closed
messages: + msg332868

keywords: patch, patch, patch
resolution: fixed
stage: patch review -> resolved
2019-01-02 12:22:09serhiy.storchakasetmessages: + msg332867
2018-12-29 11:21:51mark.dickinsonsetkeywords: patch, patch, patch

messages: + msg332711
2018-12-28 17:57:31serhiy.storchakasetkeywords: patch, patch, patch

messages: + msg332669
2018-12-28 14:10:03josh.rsetkeywords: patch, patch, patch
nosy: + josh.r
messages: + msg332659

2018-12-26 18:36:11scodersetmessages: + msg332557
2018-12-26 16:20:32serhiy.storchakasetkeywords: patch, patch, patch

messages: + msg332554
2018-12-26 16:01:16scodersetmessages: + msg332553
2018-12-26 15:09:52serhiy.storchakasetkeywords: patch, patch, patch

messages: + msg332547
2018-12-26 12:44:18scodersetmessages: + msg332538
title: Speed up mod/divmod for Fraction type -> Speed up mod/divmod/floordiv for Fraction type
2018-12-26 12:30:19scodersetmessages: + msg332537
2018-12-26 10:01:35scodersetnosy: + mark.dickinson, serhiy.storchaka
2018-12-26 09:48:56scodersetkeywords: + patch
stage: patch review
pull_requests: + pull_request10584
2018-12-26 09:48:54scodersetkeywords: + patch
stage: (no value)
pull_requests: + pull_request10583
2018-12-26 09:48:50scodersetkeywords: + patch
stage: (no value)
pull_requests: + pull_request10582
2018-12-26 09:39:32scodercreate