A few small comments and nits.
1. I'm with the author on the question of a sum function in this module. The arguments that builtin sum isn't accurate enough, and neither is math.fsum for cases where all data is of infinite precision, are enough for me.
2. A general percentile function should be high on the list of next additions.
A substantive question:
3. Can't add_partial be used in the one-pass algorithms?
Several typos and suggested style tweaks:
4. I would find the summary more readable if grouped by function:
add_partial, sum, StatisticsError; mean, median, mode; pstdev, pvariance, stdev, variance. Maybe I'd like it better if the utilities came last. IMO YMMV, of course.
5. In the big comment in add_partial, "the inner loop" is mentioned. Indeed this is the inner loop in statistics.sum, but there's only one loop in add_partial.
6. In the Limitations section of sum's docstring it says "these limitations may change". Is "these limitations may be relaxed" what is meant? I would hope so, but the current phrasing makes me nervous.
7. In sum, there are two comments referring to the construct "type(total).__float__(total)", with the first being a forward reference to the second. I would find a single comment above the "isinstance(total, float)" test more readable. Eg,
"""
First, accumulate a non-float sum. Until we find a float, we keep adding.
If we find a float, we exit this loop, convert the partial sum to float, and continue with the float code below. Non-floats are converted to float with 'type(x).__float__(x)'. Don't call float() directly, as that converts strings and we don't want that. Also, like all dunder methods, we should call __float__ on the class, not the instance.
"""
8. The docstrings for mean and variance say they are unbiased. This depends on the strong assumption of a representative (typically i.i.d.) sample. I think this should be mentioned.
9. Several docstrings say "this function should be used when ...". In fact the choice of which function to use is somewhat delicate. My personal preference would be to use "may" rather than "should."
10. In several of the mode functions, the value is a sorted sequence. The sort key should be specified, because it could be the data value or the score. |