Issue21046

Created on **2014-03-24 03:10** by **zach.ware**, last changed **2014-06-28 01:37** by **ezio.melotti**. This issue is now **closed**.

Messages (14) | |||
---|---|---|---|

msg214666 - (view) | Author: Zachary Ware (zach.ware) * | Date: 2014-03-24 03:10 | |

From docs@: On Sun, Mar 23, 2014 at 5:55 PM, Alex <aaa5500 at ya.ru> wrote: > http://docs.python.org/dev/library/statistics.html > > I know math. I ended the institute. But in Russia. Doc doesn't show me WHAT > FORMULAS are used for mean, median, median_low , etc. I canot understand > doc. Please write formulas: > > e.g. mean = sum(x[i] from i=1 to N) / N > > > Regards > Alex |
|||

msg214667 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2014-03-24 04:19 | |

At the top of the documentation page is a link to the pure python source code for the statistics functions. The source for the main functions is short, readable, and clear about exactly what is being done. The code for the help functions like _sum() is a bit convoluted but the basic idea is to perform basic math in a way that doesn't lose precision. |
|||

msg214684 - (view) | Author: Steven D'Aprano (steven.daprano) * | Date: 2014-03-24 11:37 | |

If any of the docs are unclear, I would be very happy take suggestions to improve them. But I'm not entirely sure that the docs are the right place to show the equations. You should be able to look them up on Wikipedia or Wolfram Mathworld if you have doubt about them. Some of the functions, like mode() and the median functions, don't have equations. I am willing to include equations where appropriate, if others think that the documentation will be enhanced by them. But so far, I am unconvinced of the need. |
|||

msg214692 - (view) | Author: Alextp (Alextp) | Date: 2014-03-24 16:11 | |

I'm author or topic I suggest to give simple formulas. for ex - 1) mean. Calculates sum of all values in iterable, divided by number of elements. E.g. mean([x1, x2, ..., xN]) = (x1 + x2 + ... + xN) / N 2) median. Calculates value with middle index from iterable. If number of elements is even, ie no strict middle index exists, then function takes average of two values at two indexes near middle. E.g. median([x1, x2, x3, x4, x5]) = x3 median([x1, x2, x3, x4, x5, x6]) = (x3 + x4) / 2 3) median_low. Calculates value with middle index from iterable. If number of elements is even, ie no strict middle index exists, then function takes value at near index, lower than middle. 4) median_high. Calculates value with middle index from iterable. If number of elements is even, ie no strict middle index exists, then function takes value at near index, higher than middle. 5) median_grouped. (((NOTE!! I may not understand median_grouped OK))) Calculates average of values of iterable at L given middle indexes. E.g. median_grouped([x1, x2, x3, x4, x5], L=3) = (x2+x3+x4)/3 NOTE: pls check this! |
|||

msg214695 - (view) | Author: Alextp (Alextp) | Date: 2014-03-24 16:30 | |

I wrote not ok formula for median_grouped. But i can't get idea from source. THIS SHOWS that source code is NOT ok doc, even student can't get it e.g. pvariance. Calculates population variance of iterable. It's given by formula: pvariance([x1, x2, ..., xN]) = ((x1 - M)**2 + ... + (xN - M)**2) / N, where M is mean of all values: M = (x1 + ... + xN) / N |
|||

msg214705 - (view) | Author: Alextp (Alextp) | Date: 2014-03-24 18:23 | |

5) pvariance. Calculates "population variance" of iterable by such formula: pvariance([x1, x2, ..., xN], M) = ((x1 - M)**2 + ... + (xN - M)**2) / N M is optional argument which should be value of mean([x1, ... xN]) calculated before. If M parameter is missed in call, it's calculated automatically: M = (x1 + ... + xN) / N 6) variance. (NOTE: pls check this.) Calculates "sample variance" from iterable. It's given by the same formula as pvariance, but not for entire iterable value set. Only subset of iterable is used for calculation. .......... (write here how this subset is taken, randomly or what..... i didn't get it from Wikipedia.) Ok? |
|||

msg214706 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2014-03-24 18:27 | |

IMHO the docs shouldn't be cluttered with details such as this. |
|||

msg214711 - (view) | Author: Alextp (Alextp) | Date: 2014-03-24 19:08 | |

Without details like these it must be URLS to wikipedia or Wolfram. Usual users don't know how to search wolfram. |
|||

msg218523 - (view) | Author: Ezio Melotti (ezio.melotti) * | Date: 2014-05-14 12:04 | |

> E.g. > median([x1, x2, x3, x4, x5]) = x3 > median([x1, x2, x3, x4, x5, x6]) = (x3 + x4) / 2 The docs seem to already contain similar examples for some of the functions (e.g. median()), but not for others (e.g. mean()). For these, if the formula can be expressed with a simple Python equivalent (e.g. sum(values) / len(values)), I think it would be reasonable to add it. |
|||

msg218639 - (view) | Author: Alextp (Alextp) | Date: 2014-05-16 01:41 | |

@Ezio: of course, much of these funcs CANNOT be expressed as simple formulas. Only with some text. I shown example descriptions for almost all- above. |
|||

msg218646 - (view) | Author: Ezio Melotti (ezio.melotti) * | Date: 2014-05-16 07:50 | |

Do you want to propose a patch? |
|||

msg218651 - (view) | Author: Steven D'Aprano (steven.daprano) * | Date: 2014-05-16 10:17 | |

On Fri, May 16, 2014 at 07:50:16AM +0000, Ezio Melotti wrote: > Do you want to propose a patch? I'm really not sure that I agree with this request. I'm currently sitting on the fence, undecided, about 60% against and 40% in favour of explicitly documenting the formulae. This is not Mathworld or Wikipedia, and it is easy to google for "variance" to find out what it means. This request orginally came from somebody who claimed he didn't know what the functions were from the names (mean, median, variance) but would recognise them from the formulae. Given how hard it is to accurately portray mathematical formulae in plain text, and how many different versions of the mathematical formulae there are, I don't think that will apply to very many people. There's no good way to write mathematical functions *accurately* in ASCII text. I can write mean(L) = sum(L)/len(L), for example, that's quite trivial. But it's not the usual mathematical formula. If the OP doesn't recognise the name "mean", will he recognise that non-standard formula? Should the docs include μ = ∑x÷n? But even that's not quite accurate -- where's the subscript on the x? The reader needs to understand the formula, and they aren't going to get that here. They probably have to go read Mathworld or Wikipedia regardless. The problem is compounded with variance. Which of these should we write? σ² = ∑(x - μ)² ÷ n s² = ∑x² ÷ n - μ² s[n]² = ∑(x - a)² ÷ n Var(X) = E[X-μ)²] Var(X) = E[X²] - (E[X])² or something else? What do other statistics packages do? I wouldn't want to do *less* -- if it is common for other stats packages to show the formula, then I would agree we should do the same. R doesn't seem to do so: http://stat.ethz.ch/R-manual/R-devel/library/base/html/mean.html |
|||

msg218652 - (view) | Author: Ezio Melotti (ezio.melotti) * | Date: 2014-05-16 10:44 | |

From msg214692 it seems to me that Alex wants "Python-friendly" formulas or examples, rather than mathematical formulas. Most functions seems to already have them, so I was asking for a patch to get a better idea of which functions he thinks should be improved and how. As an example, itertools docs have simple "formulas" explaining what the function does and an example in the table at the top, and (possibly approximate) Python equivalents for most of the functions: https://docs.python.org/dev/library/itertools.html While the Python equivalent are probably not needed here, some simple formulas/examples might be OK, but I would have to see what exactly Alex is proposing. |
|||

msg221625 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2014-06-26 17:36 | |

Three months gone and still no patch, not that I believe one is needed. I'm inclined to close as "won't fix", there's nothing to stop it being reopened if needed. |

History | |||
---|---|---|---|

Date | User | Action | Args |

2014-06-28 01:37:32 | ezio.melotti | set | status: open -> closed resolution: works for me stage: needs patch -> resolved |

2014-06-26 18:04:52 | zach.ware | set | nosy:
- zach.ware |

2014-06-26 17:36:13 | BreamoreBoy | set | messages: + msg221625 |

2014-05-16 10:44:40 | ezio.melotti | set | messages: + msg218652 |

2014-05-16 10:17:09 | steven.daprano | set | messages: + msg218651 |

2014-05-16 07:50:16 | ezio.melotti | set | messages: + msg218646 |

2014-05-16 01:41:58 | Alextp | set | messages: + msg218639 |

2014-05-14 12:04:57 | ezio.melotti | set | nosy:
+ ezio.melotti messages: + msg218523 |

2014-03-24 19:08:59 | Alextp | set | messages: + msg214711 |

2014-03-24 18:27:29 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg214706 |

2014-03-24 18:23:53 | Alextp | set | messages: + msg214705 |

2014-03-24 16:30:08 | Alextp | set | messages: + msg214695 |

2014-03-24 16:11:36 | Alextp | set | nosy:
+ Alextp messages: + msg214692 |

2014-03-24 11:37:53 | steven.daprano | set | messages: + msg214684 |

2014-03-24 04:19:47 | rhettinger | set | nosy:
+ rhettinger messages: + msg214667 |

2014-03-24 03:10:19 | zach.ware | create |