Message 335014 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	steven.daprano
Recipients	josh.r, mark.dickinson, rhettinger, steven.daprano, tim.peters
Date	2019-02-07.12:14:52
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<20190207121443.GG1834@ando.pearwood.info>
In-reply-to	<1549411161.36.0.783593325223.issue35904@roundup.psfhosted.org>

Content
In the PEP, I did say that I was making no attempt to compete with numpy for speed, and that correctness was more important than speed. That doesn't mean I don't care about speed. Nor do I necessarily care about absolute precision when given nothing but float arguments. Mark suggests that using fsum() will be accurate to within 1.5 ulp which satisfies me for float arguments. I doubt that stdev etc would be able to promise that accuracy, so provided your data is all floats, that seems like a pretty good result for the mean. But I'm not super-keen on having two separate mean() functions if that opens the floodgates to people wanting every statistics function to grow a fast-but-floats-only twin. That would make me sad. But maybe mean() is special enough to justify twinning it. In my ideal world, I'd have a single mean() function that had a fast-path for float data, but would automatically drop down to a slower but more accurate path for other types, out-of-range data, etc. I believe that the the built-in sum() function does something like this. When I say "more accurate", this isn't a complaint about fsum(). It refers to the limitation of floats themselves. Call me Mr Silly if you like, but if I need to take the average of numbers bigger than 2**1074 I would like to be able to, even if it takes a bit longer :-)

In the PEP, I did say that I was making no attempt to compete with numpy 
for speed, and that correctness was more important than speed.

That doesn't mean I don't care about speed. Nor do I necessarily care 
about absolute precision when given nothing but float arguments. Mark 
suggests that using fsum() will be accurate to within 1.5 ulp which 
satisfies me for float arguments.

I doubt that stdev etc would be able to promise that accuracy, so 
provided your data is all floats, that seems like a pretty good result 
for the mean.

But I'm not super-keen on having two separate mean() functions if that 
opens the floodgates to people wanting every statistics function to grow 
a fast-but-floats-only twin. That would make me sad. But maybe mean() is 
special enough to justify twinning it.

In my ideal world, I'd have a single mean() function that had a 
fast-path for float data, but would automatically drop down to a slower 
but more accurate path for other types, out-of-range data, etc. I 
believe that the the built-in sum() function does something like this.

When I say "more accurate", this isn't a complaint about fsum(). It 
refers to the limitation of floats themselves. Call me Mr Silly if you 
like, but if I need to take the average of numbers bigger than 2**1074 I 
would like to be able to, even if it takes a bit longer :-)

History
Date	User	Action	Args
2019-02-07 12:14:54	steven.daprano	set	recipients: + steven.daprano, tim.peters, rhettinger, mark.dickinson, josh.r
2019-02-07 12:14:52	steven.daprano	link	issue35904 messages
2019-02-07 12:14:52	steven.daprano	create