Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statistics module - incorrect results with boolean input #68256

Closed
wm75 mannequin opened this issue Apr 28, 2015 · 6 comments
Closed

statistics module - incorrect results with boolean input #68256

wm75 mannequin opened this issue Apr 28, 2015 · 6 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@wm75
Copy link
Mannequin

wm75 mannequin commented Apr 28, 2015

BPO 24068
Nosy @rhettinger, @mdickinson, @stevendaprano, @bitdancer, @wm75
Files
  • statistics._sum.patch
  • statistics._sum.v2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/stevendaprano'
    closed_at = <Date 2018-04-08.20:03:14.377>
    created_at = <Date 2015-04-28.08:53:26.540>
    labels = ['type-bug', 'library']
    title = 'statistics module - incorrect results with boolean input'
    updated_at = <Date 2018-04-08.20:03:14.376>
    user = 'https://github.com/wm75'

    bugs.python.org fields:

    activity = <Date 2018-04-08.20:03:14.376>
    actor = 'wolma'
    assignee = 'steven.daprano'
    closed = True
    closed_date = <Date 2018-04-08.20:03:14.377>
    closer = 'wolma'
    components = ['Library (Lib)']
    creation = <Date 2015-04-28.08:53:26.540>
    creator = 'wolma'
    dependencies = []
    files = ['39221', '39269']
    hgrepos = []
    issue_num = 24068
    keywords = ['patch']
    message_count = 6.0
    messages = ['242169', '242362', '242370', '242428', '242451', '315095']
    nosy_count = 6.0
    nosy_names = ['rhettinger', 'mark.dickinson', 'steven.daprano', 'r.david.murray', 'della', 'wolma']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue24068'
    versions = ['Python 3.4', 'Python 3.5']

    @wm75
    Copy link
    Mannequin Author

    wm75 mannequin commented Apr 28, 2015

    the mean function in the statistics module gives nonsensical results with boolean values in the input, e.g.:

    >>> mean([True, True, False, False])
    0.25
    
    >>> mean([True, 1027])
    0.5

    This is an issue with the module's internal _sum function that mean relies on. Other functions relying on _sum are affected more subtly, e.g.:

    >>> variance([1, 1027, 0])
    351234.3333333333
    
    >>> variance([True, 1027, 0])
    351234.3333333334

    The problem with _sum is that it will try to coerce its result to any non-int type found in the input (so bool in the examples), but bool(1028) is just True so information gets lost.

    I've attached a patch preventing the type cast when it would be to bool.
    I don't have time to write a separate test though so if somebody wants to take over .. :)

    @wm75 wm75 mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Apr 28, 2015
    @bitdancer
    Copy link
    Member

    I wonder if it would be better to reject Bool data in this context? Bool is only a numeric type for historical reasons.

    @stevendaprano
    Copy link
    Member

    The patch seems simple and straightforward enough. It just needs some tests, and a Round Tuit.

    @stevendaprano stevendaprano self-assigned this May 2, 2015
    @wm75
    Copy link
    Mannequin Author

    wm75 mannequin commented May 2, 2015

    uploading an alternate, possibly slightly clearer version of the patch

    @mdickinson
    Copy link
    Member

    I wonder if it would be better to reject Bool data in this context?

    It's not uncommon (and quite useful) in NumPy world to compute basic statistics on arrays of boolean dtype: the sum of such an array gives a count of the Trues, and the mean gives the proportion of True entries. I think it would be handy to allow the statistics module to work with lists of bools, if possible.

    @wm75
    Copy link
    Mannequin Author

    wm75 mannequin commented Apr 8, 2018

    Fixed as part of resolving bpo-25177.

    @wm75 wm75 mannequin closed this as completed Apr 8, 2018
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants