Message 334894 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	rhettinger
Recipients	rhettinger, steven.daprano, tim.peters
Date	2019-02-05.23:59:20
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1549411161.36.0.783593325223.issue35904@roundup.psfhosted.org>
In-reply-to

Content
The current mean() function makes heroic efforts to achieve last bit accuracy and when possible to retain the data type of the input. What is needed is an alternative that has a simpler signature, that is much faster, that is highly accurate without demanding perfection, and that is usually what people expect mean() is going to do, the same as their calculators or numpy.mean(): def fmean(seq: Sequence[float]) -> float: return math.fsum(seq) / len(seq) On my current 3.8 build, this code given an approx 500x speed-up (almost three orders of magnitude). Note that having a fast fmean() function is important in resampling statistics where the mean() is typically called many times: http://statistics.about.com/od/Applications/a/Example-Of-Bootstrapping.htm $ ./python.exe -m timeit -r 11 -s 'from random import random' -s 'from statistics import mean' -s 'seq = [random() for i in range(10_000)]' 'mean(seq)' 50 loops, best of 11: 6.8 msec per loop $ ./python.exe -m timeit -r 11 -s 'from random import random' -s 'from math import fsum' -s 'mean=lambda seq: fsum(seq)/len(seq)' -s 'seq = [random() for i in range(10_000)]' 'mean(seq)' 2000 loops, best of 11: 155 usec per loop

The current mean() function makes heroic efforts to achieve last bit accuracy and when possible to retain the data type of the input.

What is needed is an alternative that has a simpler signature, that is much faster, that is highly accurate without demanding perfection, and that is usually what people expect mean() is going to do, the same as their calculators or numpy.mean():

   def fmean(seq: Sequence[float]) -> float:
       return math.fsum(seq) / len(seq)

On my current 3.8 build, this code given an approx 500x speed-up (almost three orders of magnitude).   Note that having a fast fmean() function is important in resampling statistics where the mean() is typically called many times:  http://statistics.about.com/od/Applications/a/Example-Of-Bootstrapping.htm 


$ ./python.exe -m timeit -r 11 -s 'from random import random' -s 'from statistics import mean' -s 'seq = [random() for i in range(10_000)]' 'mean(seq)'
50 loops, best of 11: 6.8 msec per loop

$ ./python.exe -m timeit -r 11 -s 'from random import random' -s 'from math import fsum' -s 'mean=lambda seq: fsum(seq)/len(seq)' -s 'seq = [random() for i in range(10_000)]' 'mean(seq)'
2000 loops, best of 11: 155 usec per loop

History
Date	User	Action	Args
2019-02-05 23:59:24	rhettinger	set	recipients: + rhettinger, tim.peters, steven.daprano
2019-02-05 23:59:21	rhettinger	set	messageid: <1549411161.36.0.783593325223.issue35904@roundup.psfhosted.org>
2019-02-05 23:59:21	rhettinger	link	issue35904 messages
2019-02-05 23:59:20	rhettinger	create