Message 338742 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	rhettinger
Recipients	cheryl.sabella, cool-RR, koobs, mark.dickinson, martin.panter, python-dev, rhettinger, steven.daprano, vstinner
Date	2019-03-24.18:24:21
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1553451861.52.0.943199842847.issue27181@roundup.psfhosted.org>
In-reply-to

Content
Almost three years have passed. In the spirit of "perfect is the enemy of good", would it be reasonable to start with a simple, fast implementation using exp-mean-log? Then if someone wants to make it more accurate later, they can do so. In some quick tests, I don't see much of an accuracy loss. It looks to be plenty good enough to use as a starting point: --- Accuracy experiments --- >>> from decimal import Decimal >>> from functools import reduce >>> from operator import mul >>> from random import expovariate, triangular >>> from statistics import fmean >>> # https://www.wolframalpha.com/input/?i=geometric+mean+12,+17,+13,+5,+120,+7 >>> data = [12, 17, 13, 5, 120, 7] >>> print(reduce(mul, map(Decimal, data)) (Decimal(1) / len(data))) 14.94412420173971227234687688 >>> exp(fmean(map(log, map(fabs, data)))) 14.944124201739715 >>> data = [expovariate(50.0) for i in range(1_000)] >>> print(reduce(mul, map(Decimal, data)) (Decimal(1) / len(data))) 0.01140902688569587677205587938 >>> exp(fmean(map(log, map(fabs, data)))) 0.011409026885695879 >>> data = [triangular(2000.0, 3000.0, 2200.0) for i in range(10_000)] >>> print(reduce(mul, map(Decimal, data)) (Decimal(1) / len(data))) 2388.381301718524160840023868 >>> exp(fmean(map(log, map(fabs, data)))) 2388.3813017185225 >>> data = [lognormvariate(20.0, 3.0) for i in range(100_000)] >>> min(data), max(data) (2421.506538652375, 137887726484094.5) >>> print(reduce(mul, map(Decimal, data)) (Decimal(1) / len(data))) 484709306.8805352290183838500 >>> exp(fmean(map(log, map(fabs, data)))) 484709306.8805349

Almost three years have passed.

In the spirit of "perfect is the enemy of good", would it be reasonable to start with a simple, fast implementation using exp-mean-log?  Then if someone wants to make it more accurate later, they can do so.

In some quick tests, I don't see much of an accuracy loss. It looks to be plenty good enough to use as a starting point:

--- Accuracy experiments ---

>>> from decimal import Decimal
>>> from functools import reduce
>>> from operator import mul
>>> from random import expovariate, triangular
>>> from statistics import fmean

>>> # https://www.wolframalpha.com/input/?i=geometric+mean+12,+17,+13,+5,+120,+7
>>> data = [12, 17, 13, 5, 120, 7]
>>> print(reduce(mul, map(Decimal, data)) ** (Decimal(1) / len(data)))
14.94412420173971227234687688
>>> exp(fmean(map(log, map(fabs, data))))
14.944124201739715

>>> data = [expovariate(50.0) for i in range(1_000)]
>>> print(reduce(mul, map(Decimal, data)) ** (Decimal(1) / len(data)))
0.01140902688569587677205587938
>>> exp(fmean(map(log, map(fabs, data))))
0.011409026885695879

>>> data = [triangular(2000.0, 3000.0, 2200.0) for i in range(10_000)]
>>> print(reduce(mul, map(Decimal, data)) ** (Decimal(1) / len(data)))
2388.381301718524160840023868
>>> exp(fmean(map(log, map(fabs, data))))
2388.3813017185225

>>> data = [lognormvariate(20.0, 3.0) for i in range(100_000)]
>>> min(data), max(data)
(2421.506538652375, 137887726484094.5)
>>> print(reduce(mul, map(Decimal, data)) ** (Decimal(1) / len(data)))
484709306.8805352290183838500
>>> exp(fmean(map(log, map(fabs, data))))
484709306.8805349

History
Date	User	Action	Args
2019-03-24 18:24:21	rhettinger	set	recipients: + rhettinger, mark.dickinson, vstinner, steven.daprano, cool-RR, python-dev, martin.panter, koobs, cheryl.sabella
2019-03-24 18:24:21	rhettinger	set	messageid: <1553451861.52.0.943199842847.issue27181@roundup.psfhosted.org>
2019-03-24 18:24:21	rhettinger	link	issue27181 messages
2019-03-24 18:24:21	rhettinger	create