This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add optional weighting to statistics.harmonic_mean()
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: ZackerySpytz, corona10, mark.dickinson, rhettinger, serhiy.storchaka, steven.daprano
Priority: normal Keywords: patch

Created on 2019-09-28 18:28 by rhettinger, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 23914 merged rhettinger, 2020-12-23 23:14
PR 23919 merged ZackerySpytz, 2020-12-24 06:43
Messages (13)
msg353469 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-09-28 18:28
Currently, harmonic_mean() is difficult to use in real applications because it assumes equal weighting.  While that is sometimes true, the API precludes a broad class of applications where the weights are uneven.

That is easily remedied with an optional *weights* argument modeled after the API for random.choices():

    harmonic_mean(data, weights=None)


Examples
--------

Suppose a car travels 40 km/hr for 5 km, and when traffic clears, speeds-up to 60 km/hr for the remaining 30 km of the journey. What is the average speed?

    >>> harmonic_mean([40, 60], weights=[5, 30])
    56.0

Suppose an investor owns shares in each of three companies, with P/E (price/earning) ratios of 2.5, 3 and 10, and with market values of 10,000, 7,200, and 12,900 respectively.  What is the weighted average P/E ratio for the investor’s portfolio?

    >>> avg_pe = harmonic_mean([2.5, 3, 10], weights=[10_000, 7_200, 12_900])
    >>> round(avg_pe, 1)
    3.9


Existing workarounds
--------------------

It is possible to use the current API for theses tasks, but it is inconvenient, awkward, slow, and only works with integer ratios:

    >>> harmonic_mean([40]*5 + [60]*30)
    56.0

    >>> harmonic_mean([2.5]*10_000 + [3]*7_200 + [10]*12_900)
    3.9141742522756826


Algorithm
---------

Following the formula at https://en.wikipedia.org/wiki/Harmonic_mean#Weighted_harmonic_mean , the algorithm is straight forward:

    def weighted_harmonic_mean(data, weights):
        num = den = 0
        for x, w in zip(data, weights):
            num += w
            den += w / x
        return num / den


PR
--

If you're open to this suggestion, I'll work-up a PR modeled after the existing code and that uses _sum() and _fail_neg() for exactness and data validity checks.
msg353526 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-09-30 03:32
Sounds like a great idea to me.

Thanks,

Steven
msg353550 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2019-09-30 07:41
Great idea!
msg353985 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2019-10-05 02:50
@rhettinger
If you are okay, Can I process with this issue with your guide?
msg353992 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-10-05 05:16
> Can I process with this issue with your guide?

Thank you, but this is one I would like to do myself.  I've already done work on it and would like to wrap it up (also, it's more complicated than it seems because the supporting functions are a bit awkward to use in this context).
msg353999 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2019-10-05 08:58
Oh.. Thank you for letting me know
msg383664 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-12-24 00:53
I like the addition but I'm not sure why you removed the price-earnings ratio example from the docs. I think that it's useful to have an example that shows that harmonic mean is not *just* for speed-related problems.

I'm not going to reject your change just on this documentation issue, but I would like to hear why you removed the P/E example instead of just adding additional examples.
msg383665 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-12-24 02:22
> I would like to hear why you removed the P/E example
> instead of just adding additional examples.

I tried out the existing P/E example in my Python courses and found that it had very little explanatory power — in general, non-finance people know less about P/E ratios than they know about the harmonic mean :-)  

For people with a finance background who do already understand P/E ratios, the example is weak.  The current example only works mathematically if the portfolios are exactly the same market value at the time the ratios are combined — this never happens.  Also P/E ratios in real portfolios include zero and negative values — that won't work with our harmonic mean.  Also, combining P/Es for non-homogenous securities is a bit of dark art.  Given a utility stock, a healthcare stock, and a tech stock, the aggregate P/E is rarely comparable to anything else.

All that said, I would be happy to add the example back if you think it is necessary.  It's your module and it's important that you're happy with it :-)

> I think that it's useful to have an example that shows that 
> harmonic mean is not *just* for speed-related problems.

I considered using a resistors in parallel example, but that is somewhat specialized and isn't directly applicable because we normally don't want a mean at all, we just want the equivalent resistance.

I also thought about adding something like: "The harmonic mean is the smaller of the three Pythagorean means and tends to emphasize the impact of small outliers while minimizing the impact of large outliers."  But while this is true, I've never seen a data scientist switch from an arithmetic mean to a harmonic mean to achieve this effect.
msg383667 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-12-24 02:42
Okay, I'm satisfied with that reasoning, thanks Raymond.

Patch looks good to me. Go for it!

Have a good Christmas and stay safe.
msg383671 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-12-24 03:52
New changeset cc3467a57b61b0e7ef254b36790a1c44b13f2228 by Raymond Hettinger in branch 'master':
bpo-38308: Add optional weighting to statistics.harmonic_mean() (GH-23914)
https://github.com/python/cpython/commit/cc3467a57b61b0e7ef254b36790a1c44b13f2228
msg383672 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-12-24 03:53
Thank you and Merry Christmas.
msg383677 - (view) Author: Zackery Spytz (ZackerySpytz) * (Python triager) Date: 2020-12-24 07:11
The "versionchanged" for *weights* should be 3.10, not 3.8.  I've created PR 23919 to fix this.
msg384268 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-01-03 12:35
New changeset 66136768615472a8d1a18b5018095b9737dbab8c by Zackery Spytz in branch 'master':
bpo-38308: Fix the "versionchanged" for the *weights* of harmonic_mean() (GH-23919)
https://github.com/python/cpython/commit/66136768615472a8d1a18b5018095b9737dbab8c
History
Date User Action Args
2022-04-11 14:59:20adminsetgithub: 82489
2021-01-03 12:35:29serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg384268
2020-12-24 07:11:59ZackerySpytzsetmessages: + msg383677
versions: + Python 3.10, - Python 3.9
2020-12-24 06:43:41ZackerySpytzsetnosy: + ZackerySpytz

pull_requests: + pull_request22770
2020-12-24 03:53:17rhettingersetstatus: open -> closed
resolution: fixed
messages: + msg383672

stage: patch review -> resolved
2020-12-24 03:52:22rhettingersetmessages: + msg383671
2020-12-24 02:42:46steven.dapranosetmessages: + msg383667
2020-12-24 02:22:46rhettingersetmessages: + msg383665
2020-12-24 00:53:32steven.dapranosetmessages: + msg383664
2020-12-23 23:17:12rhettingersetassignee: steven.daprano -> rhettinger
2020-12-23 23:14:45rhettingersetkeywords: + patch
stage: patch review
pull_requests: + pull_request22765
2019-10-05 08:58:13corona10setmessages: + msg353999
2019-10-05 05:16:48rhettingersetmessages: + msg353992
2019-10-05 02:50:54corona10setmessages: + msg353985
2019-09-30 07:41:57corona10setnosy: + corona10
messages: + msg353550
2019-09-30 03:32:32steven.dapranosetmessages: + msg353526
2019-09-29 11:35:33mark.dickinsonsetnosy: + mark.dickinson
2019-09-28 18:28:15rhettingercreate