This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mark.dickinson
Recipients mark.dickinson, reed, rhettinger, steven.daprano, taleinat, xtreak
Date 2020-01-05.10:56:51
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1578221811.97.0.50820206116.issue39218@roundup.psfhosted.org>
In-reply-to
Content
[Karthikeyan]

> can possibly break again if (x-c) * (x-c) was also changed to return float64 in future

I think it's safe to assume that multiplying two NumPy float32's will continue to give a float32 back in the future; NumPy has no reason to give back a different type, and changing this would be a big breaking change.

The big difference between (x-c)**2 and (x-c)*(x-c) in this respect is that the latter is purely an operation on float32 operands, while the former is a mixed-type operation: NumPy sees a binary operation between a float32 and a Python int, and has to figure out a suitable type for the result. The int to float32 conversion is regarded as introducing unacceptable precision loss, so it chooses float64 as an acceptable output type, converts both input operands to that type, and then does the computation.

Some relevant parts of the NumPy docs:

https://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules
https://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html

Even for pure Python floats, x*x is a simpler, more accurate, and likely faster operation than x**2. On a typical system, the former (eventually) maps to a hardware floating-point multiplication, which is highly likely to be correctly rounded, while the latter, after conversion of the r.h.s. to a float, maps to a libm pow call. That libm pow call *could* conceivably have a fast/accurate path for a right-hand-side of 2.0, but it could equally conceivably not have such a path.

OTOH, (x-c)*(x-c) repeats the subtraction unnecessarily, but perhaps assignment expressions could rescue us? For example, replace:

    sum((x-c)**2 for x in data)

with:

    sum((y:=(x-c))*y for x in data)

[Steven]

> I am inclined to have the stdev of float32 return a float32 is possible. 

Would that also imply intermediate calculations being performed only with float32, or would intermediate calculations be performed with a more precise type? float32 has small enough precision to run into real accuracy problems with modestly large datasets. For example:

    >>> import numpy as np
    >>> sum(np.ones(10**8, dtype=np.float32), start=np.float32(0))
    16777216.0  # should be 10**8

or, less dramatically:

    >>> sum(np.full(10**6, 1.73, dtype=np.float32), start=np.float32(0)) / 10**6
    1.74242125  # only two accurate significant figures
History
Date User Action Args
2020-01-05 10:56:52mark.dickinsonsetrecipients: + mark.dickinson, rhettinger, taleinat, steven.daprano, xtreak, reed
2020-01-05 10:56:51mark.dickinsonsetmessageid: <1578221811.97.0.50820206116.issue39218@roundup.psfhosted.org>
2020-01-05 10:56:51mark.dickinsonlinkissue39218 messages
2020-01-05 10:56:51mark.dickinsoncreate