classification
Title: Optimize Fraction() and statistics.mean()
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: mark.dickinson Nosy List: jdemeyer, mark.dickinson, rhettinger, serhiy.storchaka, steven.daprano
Priority: normal Keywords: patch

Created on 2019-08-19 06:17 by serhiy.storchaka, last changed 2019-08-26 11:10 by jdemeyer. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 15329 closed serhiy.storchaka, 2019-08-19 06:19
Messages (15)
msg349939 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-08-19 06:17
The proposed PR optimizes the Fraction constructor and statistics.mean() (and several other statistics functions) by using a private helper implemented in C which abstracts converting a number to an integer ratio.

$ ./python -m timeit -s "from fractions import Fraction as F" "F(123)"
500000 loops, best of 5: 655 nsec per loop
500000 loops, best of 5: 749 nsec per loop

$ ./python -m timeit -s "from fractions import Fraction as F" "F(1.23)"
200000 loops, best of 5: 1.29 usec per loop
200000 loops, best of 5: 1.03 usec per loop

$ ./python -m timeit -s "from fractions import Fraction as F; f = F(22, 7)" "F(f)"
200000 loops, best of 5: 1.17 usec per loop
500000 loops, best of 5: 899 nsec per loop

$ ./python -m timeit -s "from fractions import Fraction as F; from decimal import Decimal as D; d = D('1.23')" "F(d)"
200000 loops, best of 5: 1.64 usec per loop
200000 loops, best of 5: 1.29 usec per loop


$ ./python -m timeit -s "from statistics import mean; a = [1]*1000" "mean(a)"
500 loops, best of 5: 456 usec per loop
1000 loops, best of 5: 321 usec per loop

$ ./python -m timeit -s "from statistics import mean; a = [1.23]*1000" "mean(a)"
500 loops, best of 5: 645 usec per loop
500 loops, best of 5: 659 usec per loop

$ ./python -m timeit -s "from statistics import mean; from fractions import Fraction as F; a = [F(22, 7)]*1000" "mean(a)"
500 loops, best of 5: 637 usec per loop
500 loops, best of 5: 490 usec per loop

$ ./python -m timeit -s "from statistics import mean; from decimal import Decimal as D; a = [D('1.23')]*1000" "mean(a)"
500 loops, best of 5: 946 usec per loop
500 loops, best of 5: 915 usec per loop

There is a 14% regression in creating a Fraction from an integer, but for non-integer numbers it gains 25% speed up. The effect on statistics.mean() varies from insignificant to +40%.
msg349943 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-08-19 08:37
> There is a 14% regression in creating a Fraction from an integer

Isn't that the main use case? I suggest to keep the special case for 'int' as fast path to avoid this regression.
msg349963 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-08-19 18:29
Mark, I don't think the math module API should be expanded for as_integer_ratio().  ISTM that small and probably unimportant opimizations shouldn't spill-over into API feature creep.  What do you think?
msg349965 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-08-19 19:42
> ISTM that small and probably unimportant opimizations shouldn't spill-over into API feature creep.

The way I see it, the optimization is besides the point here. Regardless of performance, the added function is a useful feature to have to avoid forcing people to reinvent the wheel.  For example, would you want the exact same code duplicated for fractions.Fraction() and for statictics.mean()?

See also #37836 in case you didn't know about that issue.
msg349966 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-08-19 19:56
This issue if for optimization only. It does not expand the math module API.

Adding public math.as_integer_ratio() has other benefits (it allows to simplify the user code), but it is a different issue (see issue37822).
msg349967 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-08-19 20:00
AFAICT, no end-user has ever requested this ever.   Despite your insistence, we really don't have to clutter the math module with this.

We sometimes do have two or three repetitions of logic in the standard library; however, our needs tend to be much different from end-users. Also, we tend to benefit from the loose coupling and not turning every little detail into a published cross-module API.

Now that we've propagated the as_integer_ratio() into bool, int, float,  Decimal, and Fraction, I propose we stop there.  This micro-problem to too small to warrant adding yet more machinery and API creep.
msg349995 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-08-20 08:43
> AFAICT, no end-user has ever requested this ever.

What do you mean with "this"?

(A) A public function like math.as_integer_ratio()

(B) Using as_integer_ratio() in the fractions.Fraction() constructor

(C) The optimization of the fractions.Fraction() constructor

For SageMath, (B) would be very useful. See https://discuss.python.org/t/pep-3141-ratio-instead-of-numerator-denominator/2037/24?u=jdemeyer (replace __ratio__ by as_integer_ratio)
msg349996 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-08-20 09:03
> our needs tend to be much different from end-users

This issue is about fractions and statistics, which are closer to typical user libraries than CPython libraries. In fact, both could easily be packages on PyPI instead of part of the standard library.

> no end-user has ever requested this ever.

If math.as_integer_ratio() existed, probably SageMath would use it. On the other hand, the code for math.as_integer_ratio() is simple enough that SageMath could easily implement it if needed.
msg350004 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2019-08-20 14:11
> This issue if for optimization only. It does not expand the math module API.

I think it does, though. The PR adds something to the math module that's tested, that needs to be maintained for other modules to work, and that's imported for use in another module; that smells like an element of the API to me, even with the leading underscore in the name. Other Python implementations would also need to implement math._as_integer_ratio for the pure Python fractions code to continue working. If we're going to expand the math module API to include such a function (and I don't think we should), we should do it properly: remove the leading underscore, and add documentation. But as you say, that's a separate issue.

I'm with Raymond here; I don't think this change is desirable for the math module, either with or without the leading underscore in the name.
msg350192 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-08-22 15:05
May I propose PR 15327 as alternative? It solves some of the same issues as the PR on this issue, in particular supporting arbitrary objects with as_integer_ratio(). It also improves performance somewhat for certain inputs, but that's more by accident.
msg350245 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-08-23 02:14
[Mark]
> I'm with Raymond here; I don't think this change is 
> desirable for the math module, either with or without 
> the leading underscore in the name.

[Jeroen]
> May I propose PR 15327 as alternative?

I'll take a look soonish.  Since it has its own BPO, I'll go ahead and close this.

FWIW, the entire point of us having recently added as_integer_ratio() methods to so many concrete classes is to avoid the need for helper functions in favor of a simple try/except around a single call.  Please don't try to code around that solution.
msg350330 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-08-23 19:53
> FWIW, the entire point of us having recently added as_integer_ratio() methods to so many concrete classes is to avoid the need for helper functions in favor of a simple try/except around a single call.

But what about PEP 3141? The fractions.Fraction constructor accepts numbers.Rational instances, which do not necessarily have an as_integer_ratio() method. We can't just drop support for that (*). So in practice you do need to check both as_integer_ratio() and the PEP 3141 numerator/denominator properties. It seems useful to have a helper function for this.

(*) Unless you want to deprecate PEP 3141. This may be less crazy than it sounds, especially given Guido van Rossum's reaction on https://discuss.python.org/t/pep-3141-ratio-instead-of-numerator-denominator/2037/25?u=jdemeyer
msg350358 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-08-24 10:09
> I'm with Raymond here; I don't think this change is desirable for the math module, either with or without the leading underscore in the name.

Could you please explain what is wrong in adding a helper function which will help to convert different types of numbers to integer ratio correctly?

Currently you need to write a complex code if you need to convert a general number to integer ratio:

    try:
        as_integer_ratio = x.as_integer_ratio
    except AttributeError:
        n = x.numerator
        d = x.denominator
    else:
        n, d = as_integer_ratio()

(and maybe even more complex if you want to add fast path for int).
msg350476 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-08-26 01:42
On Sat, Aug 24, 2019 at 10:09:40AM +0000, Serhiy Storchaka wrote:

> Could you please explain what is wrong in adding a helper function 
> which will help to convert different types of numbers to integer ratio 
> correctly?
> 
> Currently you need to write a complex code if you need to convert a 
> general number to integer ratio:

When I first wrote the statistics module, I found that converting to
fractions was a bottleneck. I spent some time tweaking the private 
_exact_ratio() function which is why the code is so hairy.

Being able to simplify and speed up that conversion would be great, and 
I thank Serhiy for looking at this.

But I am concerned about the API, writing a *private* helper function in 
the math module, but then using it in the statistics module as if it 
were public. That will surely encourage people to do the same, and 
before long this private helper will have to be treated as public.
msg350525 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-08-26 11:10
For the record: making a public math.as_integer_ratio() function was rejected at #37822.
History
Date User Action Args
2019-08-26 11:10:58jdemeyersetmessages: + msg350525
2019-08-26 01:42:30steven.dapranosetmessages: + msg350476
2019-08-24 10:09:40serhiy.storchakasetmessages: + msg350358
2019-08-23 19:53:59jdemeyersetmessages: + msg350330
2019-08-23 02:14:53rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg350245

stage: patch review -> resolved
2019-08-22 15:05:52jdemeyersetmessages: + msg350192
2019-08-20 14:11:57mark.dickinsonsetmessages: + msg350004
2019-08-20 09:03:53jdemeyersetmessages: + msg349996
2019-08-20 08:43:27jdemeyersetmessages: + msg349995
2019-08-19 20:00:36rhettingersetmessages: + msg349967
2019-08-19 19:56:10serhiy.storchakasetmessages: + msg349966
2019-08-19 19:42:13jdemeyersetmessages: + msg349965
2019-08-19 18:29:25rhettingersetassignee: mark.dickinson
messages: + msg349963
2019-08-19 08:37:37jdemeyersetnosy: + jdemeyer
messages: + msg349943
2019-08-19 06:19:00serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request15047
2019-08-19 06:17:30serhiy.storchakacreate