Message 359329 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mark.dickinson
Recipients	mark.dickinson, reed, rhettinger, steven.daprano, taleinat, xtreak
Date	2020-01-05.10:56:51
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1578221811.97.0.50820206116.issue39218@roundup.psfhosted.org>
In-reply-to

Content
[Karthikeyan] > can possibly break again if (x-c) * (x-c) was also changed to return float64 in future I think it's safe to assume that multiplying two NumPy float32's will continue to give a float32 back in the future; NumPy has no reason to give back a different type, and changing this would be a big breaking change. The big difference between (x-c)*2 and (x-c)(x-c) in this respect is that the latter is purely an operation on float32 operands, while the former is a mixed-type operation: NumPy sees a binary operation between a float32 and a Python int, and has to figure out a suitable type for the result. The int to float32 conversion is regarded as introducing unacceptable precision loss, so it chooses float64 as an acceptable output type, converts both input operands to that type, and then does the computation. Some relevant parts of the NumPy docs: https://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules https://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html Even for pure Python floats, xx is a simpler, more accurate, and likely faster operation than x2. On a typical system, the former (eventually) maps to a hardware floating-point multiplication, which is highly likely to be correctly rounded, while the latter, after conversion of the r.h.s. to a float, maps to a libm pow call. That libm pow call could* conceivably have a fast/accurate path for a right-hand-side of 2.0, but it could equally conceivably not have such a path. OTOH, (x-c)(x-c) repeats the subtraction unnecessarily, but perhaps assignment expressions could rescue us? For example, replace: sum((x-c)2 for x in data) with: sum((y:=(x-c))y for x in data) [Steven] > I am inclined to have the stdev of float32 return a float32 is possible. Would that also imply intermediate calculations being performed only with float32, or would intermediate calculations be performed with a more precise type? float32 has small enough precision to run into real accuracy problems with modestly large datasets. For example: >>> import numpy as np >>> sum(np.ones(108, dtype=np.float32), start=np.float32(0)) 16777216.0 # should be 108 or, less dramatically: >>> sum(np.full(106, 1.73, dtype=np.float32), start=np.float32(0)) / 106 1.74242125 # only two accurate significant figures

[Karthikeyan]

> can possibly break again if (x-c) * (x-c) was also changed to return float64 in future

I think it's safe to assume that multiplying two NumPy float32's will continue to give a float32 back in the future; NumPy has no reason to give back a different type, and changing this would be a big breaking change.

The big difference between (x-c)**2 and (x-c)*(x-c) in this respect is that the latter is purely an operation on float32 operands, while the former is a mixed-type operation: NumPy sees a binary operation between a float32 and a Python int, and has to figure out a suitable type for the result. The int to float32 conversion is regarded as introducing unacceptable precision loss, so it chooses float64 as an acceptable output type, converts both input operands to that type, and then does the computation.

Some relevant parts of the NumPy docs:

https://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules
https://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html

Even for pure Python floats, x*x is a simpler, more accurate, and likely faster operation than x**2. On a typical system, the former (eventually) maps to a hardware floating-point multiplication, which is highly likely to be correctly rounded, while the latter, after conversion of the r.h.s. to a float, maps to a libm pow call. That libm pow call *could* conceivably have a fast/accurate path for a right-hand-side of 2.0, but it could equally conceivably not have such a path.

OTOH, (x-c)*(x-c) repeats the subtraction unnecessarily, but perhaps assignment expressions could rescue us? For example, replace:

    sum((x-c)**2 for x in data)

with:

    sum((y:=(x-c))*y for x in data)

[Steven]

> I am inclined to have the stdev of float32 return a float32 is possible. 

Would that also imply intermediate calculations being performed only with float32, or would intermediate calculations be performed with a more precise type? float32 has small enough precision to run into real accuracy problems with modestly large datasets. For example:

    >>> import numpy as np
    >>> sum(np.ones(10**8, dtype=np.float32), start=np.float32(0))
    16777216.0  # should be 10**8

or, less dramatically:

    >>> sum(np.full(10**6, 1.73, dtype=np.float32), start=np.float32(0)) / 10**6
    1.74242125  # only two accurate significant figures

History
Date	User	Action	Args
2020-01-05 10:56:52	mark.dickinson	set	recipients: + mark.dickinson, rhettinger, taleinat, steven.daprano, xtreak, reed
2020-01-05 10:56:51	mark.dickinson	set	messageid: <1578221811.97.0.50820206116.issue39218@roundup.psfhosted.org>
2020-01-05 10:56:51	mark.dickinson	link	issue39218 messages
2020-01-05 10:56:51	mark.dickinson	create