Message 336417 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	davin
Recipients	davin, mark.dickinson, rhettinger, selik, steven.daprano, tim.peters, xtreak
Date	2019-02-23.23:37:55
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1550965075.57.0.309398923647.issue36018@roundup.psfhosted.org>
In-reply-to

Content
There is an inconsistency worth paying attention to in the choice of names of the input parameters. Currently in the statistics module, pvariance() accepts a parameter named "mu" and pstdev() and variance() each accept a parameter named "xbar". The docs describe both "mu" and "xbar" as "it should be the mean of data". I suggest it is worth rationalizing the names used within the statistics module for consistency before reusing "mu" or "xbar" or anything else in NormalDist. Using the names of mathematical symbols that are commonly used to represent a concept is potentially confusing because those symbols are not always universally used. For example, students are often introduced to new concepts in introductory mathematics texts where concepts such as "mean" appear in formulas and equations not as "mu" but as "xbar" or simply "m" or other simple (and hopefully "friendly") names/symbols. As a mathematician, if I am told a variable is named, "mu", I still feel the need to ask what it represents. Sure, I can try guessing based upon context but I will usually have more than one guess that I could make. Rather than continue down a path of using various mathematical-symbols-written-out-in-English-spelling, one alternative would be to use less ambiguous, more informative variable names such as "mean". It might be worth considering a change to the parameter names of "mu" and "sigma" in NormalDist to names like "mean" and "stddev", respectively. Or perhaps "mean" and "standard_deviation". Or perhaps "mean" and "variance" would be easier still (recognizing that variance can be readily computed from standard deviation in this particular context). In terms of consistency with other packages that users are likely to also use, scipy.stats functions/objects commonly refer to these concepts as "mean" and "var". I like the idea of making NormalDist readily approachable for students as well as those more familiar with these concepts. The offerings in scipy.stats are excellent but they are not always the most approachable things for new students of statistics.

There is an inconsistency worth paying attention to in the choice of names of the input parameters.

Currently in the statistics module, pvariance() accepts a parameter named "mu" and pstdev() and variance() each accept a parameter named "xbar". The docs describe both "mu" and "xbar" as "it should be the mean of data". I suggest it is worth rationalizing the names used within the statistics module for consistency before reusing "mu" or "xbar" or anything else in NormalDist.

Using the names of mathematical symbols that are commonly used to represent a concept is potentially confusing because those symbols are not always *universally* used. For example, students are often introduced to new concepts in introductory mathematics texts where concepts such as "mean" appear in formulas and equations not as "mu" but as "xbar" or simply "m" or other simple (and hopefully "friendly") names/symbols. As a mathematician, if I am told a variable is named, "mu", I still feel the need to ask what it represents. Sure, I can try guessing based upon context but I will usually have more than one guess that I could make.

Rather than continue down a path of using various mathematical-symbols-written-out-in-English-spelling, one alternative would be to use less ambiguous, more informative variable names such as "mean". It might be worth considering a change to the parameter names of "mu" and "sigma" in NormalDist to names like "mean" and "stddev", respectively. Or perhaps "mean" and "standard_deviation". Or perhaps "mean" and "variance" would be easier still (recognizing that variance can be readily computed from standard deviation in this particular context). In terms of consistency with other packages that users are likely to also use, scipy.stats functions/objects commonly refer to these concepts as "mean" and "var".

I like the idea of making NormalDist readily approachable for students as well as those more familiar with these concepts. The offerings in scipy.stats are excellent but they are not always the most approachable things for new students of statistics.

History
Date	User	Action	Args
2019-02-23 23:37:55	davin	set	recipients: + davin, tim.peters, rhettinger, mark.dickinson, steven.daprano, selik, xtreak
2019-02-23 23:37:55	davin	set	messageid: <1550965075.57.0.309398923647.issue36018@roundup.psfhosted.org>
2019-02-23 23:37:55	davin	link	issue36018 messages
2019-02-23 23:37:55	davin	create