Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional weighting to statistics.harmonic_mean() #82489

Closed
rhettinger opened this issue Sep 28, 2019 · 13 comments
Closed

Add optional weighting to statistics.harmonic_mean() #82489

rhettinger opened this issue Sep 28, 2019 · 13 comments
Assignees
Labels
3.10 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@rhettinger
Copy link
Contributor

BPO 38308
Nosy @rhettinger, @mdickinson, @stevendaprano, @serhiy-storchaka, @corona10, @ZackerySpytz
PRs
  • bpo-38308: Add optional weighting to statistics.harmonic_mean() #23914
  • bpo-38308: Fix the "versionchanged" for the *weights* of harmonic_mean() #23919
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2020-12-24.03:53:17.407>
    created_at = <Date 2019-09-28.18:28:15.804>
    labels = ['type-feature', 'library', '3.10']
    title = 'Add optional weighting to statistics.harmonic_mean()'
    updated_at = <Date 2021-01-03.12:35:29.887>
    user = 'https://github.com/rhettinger'

    bugs.python.org fields:

    activity = <Date 2021-01-03.12:35:29.887>
    actor = 'serhiy.storchaka'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2020-12-24.03:53:17.407>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2019-09-28.18:28:15.804>
    creator = 'rhettinger'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 38308
    keywords = ['patch']
    message_count = 13.0
    messages = ['353469', '353526', '353550', '353985', '353992', '353999', '383664', '383665', '383667', '383671', '383672', '383677', '384268']
    nosy_count = 6.0
    nosy_names = ['rhettinger', 'mark.dickinson', 'steven.daprano', 'serhiy.storchaka', 'corona10', 'ZackerySpytz']
    pr_nums = ['23914', '23919']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue38308'
    versions = ['Python 3.10']

    @rhettinger
    Copy link
    Contributor Author

    Currently, harmonic_mean() is difficult to use in real applications because it assumes equal weighting. While that is sometimes true, the API precludes a broad class of applications where the weights are uneven.

    That is easily remedied with an optional *weights* argument modeled after the API for random.choices():

        harmonic_mean(data, weights=None)

    Examples
    --------

    Suppose a car travels 40 km/hr for 5 km, and when traffic clears, speeds-up to 60 km/hr for the remaining 30 km of the journey. What is the average speed?

        >>> harmonic_mean([40, 60], weights=[5, 30])
        56.0

    Suppose an investor owns shares in each of three companies, with P/E (price/earning) ratios of 2.5, 3 and 10, and with market values of 10,000, 7,200, and 12,900 respectively. What is the weighted average P/E ratio for the investor’s portfolio?

        >>> avg_pe = harmonic_mean([2.5, 3, 10], weights=[10_000, 7_200, 12_900])
        >>> round(avg_pe, 1)
        3.9

    Existing workarounds
    --------------------

    It is possible to use the current API for theses tasks, but it is inconvenient, awkward, slow, and only works with integer ratios:

        >>> harmonic_mean([40]*5 + [60]*30)
        56.0
    
        >>> harmonic_mean([2.5]*10_000 + [3]*7_200 + [10]*12_900)
        3.9141742522756826

    Algorithm
    ---------

    Following the formula at https://en.wikipedia.org/wiki/Harmonic_mean#Weighted_harmonic_mean , the algorithm is straight forward:

        def weighted_harmonic_mean(data, weights):
            num = den = 0
            for x, w in zip(data, weights):
                num += w
                den += w / x
            return num / den

    PR

    If you're open to this suggestion, I'll work-up a PR modeled after the existing code and that uses _sum() and _fail_neg() for exactness and data validity checks.

    @rhettinger rhettinger added 3.9 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Sep 28, 2019
    @stevendaprano
    Copy link
    Member

    Sounds like a great idea to me.

    Thanks,

    Steven

    @corona10
    Copy link
    Member

    Great idea!

    @corona10
    Copy link
    Member

    corona10 commented Oct 5, 2019

    @rhettinger
    If you are okay, Can I process with this issue with your guide?

    @rhettinger
    Copy link
    Contributor Author

    Can I process with this issue with your guide?

    Thank you, but this is one I would like to do myself. I've already done work on it and would like to wrap it up (also, it's more complicated than it seems because the supporting functions are a bit awkward to use in this context).

    @corona10
    Copy link
    Member

    corona10 commented Oct 5, 2019

    Oh.. Thank you for letting me know

    @stevendaprano
    Copy link
    Member

    I like the addition but I'm not sure why you removed the price-earnings ratio example from the docs. I think that it's useful to have an example that shows that harmonic mean is not *just* for speed-related problems.

    I'm not going to reject your change just on this documentation issue, but I would like to hear why you removed the P/E example instead of just adding additional examples.

    @rhettinger
    Copy link
    Contributor Author

    I would like to hear why you removed the P/E example
    instead of just adding additional examples.

    I tried out the existing P/E example in my Python courses and found that it had very little explanatory power — in general, non-finance people know less about P/E ratios than they know about the harmonic mean :-)

    For people with a finance background who do already understand P/E ratios, the example is weak. The current example only works mathematically if the portfolios are exactly the same market value at the time the ratios are combined — this never happens. Also P/E ratios in real portfolios include zero and negative values — that won't work with our harmonic mean. Also, combining P/Es for non-homogenous securities is a bit of dark art. Given a utility stock, a healthcare stock, and a tech stock, the aggregate P/E is rarely comparable to anything else.

    All that said, I would be happy to add the example back if you think it is necessary. It's your module and it's important that you're happy with it :-)

    I think that it's useful to have an example that shows that
    harmonic mean is not *just* for speed-related problems.

    I considered using a resistors in parallel example, but that is somewhat specialized and isn't directly applicable because we normally don't want a mean at all, we just want the equivalent resistance.

    I also thought about adding something like: "The harmonic mean is the smaller of the three Pythagorean means and tends to emphasize the impact of small outliers while minimizing the impact of large outliers." But while this is true, I've never seen a data scientist switch from an arithmetic mean to a harmonic mean to achieve this effect.

    @stevendaprano
    Copy link
    Member

    Okay, I'm satisfied with that reasoning, thanks Raymond.

    Patch looks good to me. Go for it!

    Have a good Christmas and stay safe.

    @rhettinger
    Copy link
    Contributor Author

    New changeset cc3467a by Raymond Hettinger in branch 'master':
    bpo-38308: Add optional weighting to statistics.harmonic_mean() (GH-23914)
    cc3467a

    @rhettinger
    Copy link
    Contributor Author

    Thank you and Merry Christmas.

    @ZackerySpytz
    Copy link
    Mannequin

    ZackerySpytz mannequin commented Dec 24, 2020

    The "versionchanged" for *weights* should be 3.10, not 3.8. I've created PR 23919 to fix this.

    @ZackerySpytz ZackerySpytz mannequin added 3.10 only security fixes and removed 3.9 only security fixes labels Dec 24, 2020
    @serhiy-storchaka
    Copy link
    Member

    New changeset 6613676 by Zackery Spytz in branch 'master':
    bpo-38308: Fix the "versionchanged" for the *weights* of harmonic_mean() (GH-23919)
    6613676

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants