Message 267990 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mark.dickinson
Recipients	cool-RR, mark.dickinson, rhettinger, steven.daprano
Date	2016-06-09.08:23:34
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1465460614.79.0.993129893767.issue27181@psf.upfronthosting.co.za>
In-reply-to

Content
Choice of algorithm is a bit tricky here. There are a couple of obvious algorithms that work mathematically but result in significant accuracy loss in an IEEE 754 floating-point implementation: one is `exp(mean(map(log, my_numbers)))`, where the log calls can introduce significant loss of information, and the other is `prod(x(1./len(my_numbers)) for x in my_numbers)`, where the `(1./n)` operation similarly discards information. A better algorithm numerically is `prod(my_numbers)(1./len(my_numbers))`, but that's likely to overflow quickly for large datasets (and/or datasets containing large values). I'd suggest something along the lines of `prod(my_numbers)(1./len(my_numbers))`, but keeping track of the exponent of the product separately and renormalizing where necessary to avoid overflow. There are also algorithms for improved accuracy in a product, along the same lines as the algorithm used in fsum. See e.g., the paper "Accurate Floating-Point Product and Exponentiation" by Stef Graillat. [1] (I didn't know about this paper: I just saw a reference to it in a StackOverflow comment [2], which reminded me of this issue.) [1] http://www-pequan.lip6.fr/~graillat/papers/IEEE-TC-prod.pdf [2] http://stackoverflow.com/questions/37715250/safe-computation-of-geometric-mean

Choice of algorithm is a bit tricky here. There are a couple of obvious algorithms that work mathematically but result in significant accuracy loss in an IEEE 754 floating-point implementation: one is `exp(mean(map(log, my_numbers)))`, where the log calls can introduce significant loss of information, and the other is `prod(x**(1./len(my_numbers)) for x in my_numbers)`, where the `**(1./n)` operation similarly discards information. A better algorithm numerically is `prod(my_numbers)**(1./len(my_numbers))`, but that's likely to overflow quickly for large datasets (and/or datasets containing large values).

I'd suggest something along the lines of `prod(my_numbers)**(1./len(my_numbers))`, but keeping track of the exponent of the product separately and renormalizing where necessary to avoid overflow.

There are also algorithms for improved accuracy in a product, along the same lines as the algorithm used in fsum. See e.g., the paper "Accurate Floating-Point Product and Exponentiation" by Stef Graillat. [1] (I didn't know about this paper: I just saw a reference to it in a StackOverflow comment [2], which reminded me of this issue.)

[1] http://www-pequan.lip6.fr/~graillat/papers/IEEE-TC-prod.pdf
[2] http://stackoverflow.com/questions/37715250/safe-computation-of-geometric-mean

History
Date	User	Action	Args
2016-06-09 08:23:34	mark.dickinson	set	recipients: + mark.dickinson, rhettinger, steven.daprano, cool-RR
2016-06-09 08:23:34	mark.dickinson	set	messageid: <1465460614.79.0.993129893767.issue27181@psf.upfronthosting.co.za>
2016-06-09 08:23:34	mark.dickinson	link	issue27181 messages
2016-06-09 08:23:34	mark.dickinson	create