Author davin
Recipients davin, mark.dickinson, rhettinger, selik, steven.daprano, tim.peters, xtreak
Date 2019-02-23.23:37:55
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1550965075.57.0.309398923647.issue36018@roundup.psfhosted.org>
In-reply-to
Content
There is an inconsistency worth paying attention to in the choice of names of the input parameters.

Currently in the statistics module, pvariance() accepts a parameter named "mu" and pstdev() and variance() each accept a parameter named "xbar".  The docs describe both "mu" and "xbar" as "it should be the mean of data".  I suggest it is worth rationalizing the names used within the statistics module for consistency before reusing "mu" or "xbar" or anything else in NormalDist.

Using the names of mathematical symbols that are commonly used to represent a concept is potentially confusing because those symbols are not always *universally* used.  For example, students are often introduced to new concepts in introductory mathematics texts where concepts such as "mean" appear in formulas and equations not as "mu" but as "xbar" or simply "m" or other simple (and hopefully "friendly") names/symbols.  As a mathematician, if I am told a variable is named, "mu", I still feel the need to ask what it represents.  Sure, I can try guessing based upon context but I will usually have more than one guess that I could make.

Rather than continue down a path of using various mathematical-symbols-written-out-in-English-spelling, one alternative would be to use less ambiguous, more informative variable names such as "mean".  It might be worth considering a change to the parameter names of "mu" and "sigma" in NormalDist to names like "mean" and "stddev", respectively.  Or perhaps "mean" and "standard_deviation".  Or perhaps "mean" and "variance" would be easier still (recognizing that variance can be readily computed from standard deviation in this particular context).  In terms of consistency with other packages that users are likely to also use, scipy.stats functions/objects commonly refer to these concepts as "mean" and "var".

I like the idea of making NormalDist readily approachable for students as well as those more familiar with these concepts.  The offerings in scipy.stats are excellent but they are not always the most approachable things for new students of statistics.
History
Date User Action Args
2019-02-23 23:37:55davinsetrecipients: + davin, tim.peters, rhettinger, mark.dickinson, steven.daprano, selik, xtreak
2019-02-23 23:37:55davinsetmessageid: <1550965075.57.0.309398923647.issue36018@roundup.psfhosted.org>
2019-02-23 23:37:55davinlinkissue36018 messages
2019-02-23 23:37:55davincreate