This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mark.dickinson
Recipients iritkatriel, mark.dickinson, reed, rhettinger, steven.daprano, xtreak
Date 2021-08-26.08:38:33
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1629967113.12.0.905388258672.issue39218@roundup.psfhosted.org>
In-reply-to
Content
> what it's correcting for is an inaccurate value of "c" [...]

In more detail:

Suppose "m" is the true mean of the x in data, but all we have is an approximate mean "c" to work with. Write "e" for the error in that approximation, so that c = m + e. Then (using Python notation, but treating the expressions as exact mathematical expressions computed in the reals):

   sum((x-c)**2 for x in data)

== sum((x-m-e)**2 for x in data)

== sum((x - m)**2 for x in data) - 2 * sum((x - m)*e for x in data)
                                 + sum(e**2 for x in data)

== sum((x - m)**2 for x in data) - 2 * e * sum((x - m) for x in data)
                                 + sum(e**2 for x in data)

== sum((x - m)**2 for x in data) + sum(e**2 for x in data)
       (because sum((x - m) for x in data) is 0)

== sum((x - m)**2 for x in data) + n*e**2

So the error in our result arising from the error in computing m is that n*e**2 term. And that's the term that's being subtracted here, because

   sum(x - c for x in data) ** 2 / n
== sum(x - m - e for x in data) ** 2 / n
== (sum(x - m for x in data) - sum(e for x in data))**2 / n
== (0 - n * e)**2 / n
== n * e**2
History
Date User Action Args
2021-08-26 08:38:33mark.dickinsonsetrecipients: + mark.dickinson, rhettinger, steven.daprano, xtreak, reed, iritkatriel
2021-08-26 08:38:33mark.dickinsonsetmessageid: <1629967113.12.0.905388258672.issue39218@roundup.psfhosted.org>
2021-08-26 08:38:33mark.dickinsonlinkissue39218 messages
2021-08-26 08:38:33mark.dickinsoncreate