This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author rhettinger
Recipients rhettinger, steven.daprano, tim.peters
Date 2019-02-05.23:59:20
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1549411161.36.0.783593325223.issue35904@roundup.psfhosted.org>
In-reply-to
Content
The current mean() function makes heroic efforts to achieve last bit accuracy and when possible to retain the data type of the input.

What is needed is an alternative that has a simpler signature, that is much faster, that is highly accurate without demanding perfection, and that is usually what people expect mean() is going to do, the same as their calculators or numpy.mean():

   def fmean(seq: Sequence[float]) -> float:
       return math.fsum(seq) / len(seq)

On my current 3.8 build, this code given an approx 500x speed-up (almost three orders of magnitude).   Note that having a fast fmean() function is important in resampling statistics where the mean() is typically called many times:  http://statistics.about.com/od/Applications/a/Example-Of-Bootstrapping.htm 


$ ./python.exe -m timeit -r 11 -s 'from random import random' -s 'from statistics import mean' -s 'seq = [random() for i in range(10_000)]' 'mean(seq)'
50 loops, best of 11: 6.8 msec per loop

$ ./python.exe -m timeit -r 11 -s 'from random import random' -s 'from math import fsum' -s 'mean=lambda seq: fsum(seq)/len(seq)' -s 'seq = [random() for i in range(10_000)]' 'mean(seq)'
2000 loops, best of 11: 155 usec per loop
History
Date User Action Args
2019-02-05 23:59:24rhettingersetrecipients: + rhettinger, tim.peters, steven.daprano
2019-02-05 23:59:21rhettingersetmessageid: <1549411161.36.0.783593325223.issue35904@roundup.psfhosted.org>
2019-02-05 23:59:21rhettingerlinkissue35904 messages
2019-02-05 23:59:20rhettingercreate