Issue 36169: Add overlap() method to statistics.NormalDist()

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/80350

classification

Title:	Add overlap() method to statistics.NormalDist()
Type:		Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.8

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	davin, mark.dickinson, rhettinger, steven.daprano, tim.peters
Priority:	normal	Keywords:	patch

Created on 2019-03-02 23:04 by rhettinger, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 12149	merged	rhettinger, 2019-03-03 21:58

Messages (3)
msg337020 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2019-03-02 23:04
------ How to use it ------ What percentage of men and women will have the same height in two normally distributed populations with known means and standard deviations? # http://www.usablestats.com/lessons/normal >>> men = NormalDist(70, 4) >>> women = NormalDist(65, 3.5) >>> men.overlap(women) 0.5028719270195425 The result can be confirmed empirically with a Monte Carlo simulation: >>> from collections import Counter >>> n = 100_000 >>> overlap = Counter(map(round, men.samples(n))) & Counter(map(round, women.samples(n))) >>> sum(overlap.values()) / n 0.50349 The result can also be confirmed by numeric integration of the probability density function: >>> dx = 0.10 >>> heights = [h * dx for h in range(500, 860)] >>> sum(min(men.pdf(h), women.pdf(h)) for h in heights) * dx 0.5028920586287203 ------ Code ------ def overlap(self, other): '''Compute the overlap coefficient (OVL) between two normal distributions. Measures the agreement between two normal probability distributions. Returns a value between 0.0 and 1.0 giving the overlapping area in the two underlying probability density functions. ''' # See: "The overlapping coefficient as a measure of agreement between # probability distributions and point estimation of the overlap of two # normal densities" -- Henry F. Inman and Edwin L. Bradley Jr # http://dx.doi.org/10.1080/03610928908830127 # Also see: # http://www.iceaaonline.com/ready/wp-content/uploads/2014/06/MM-9-Presentation-Meet-the-Overlapping-Coefficient-A-Measure-for-Elevator-Speeches.pdf if not isinstance(other, NormalDist): return NotImplemented X, Y = self, other X_var, Y_var = X.variance, Y.variance if not X_var or not Y_var: raise StatisticsError('overlap() not defined when sigma is zero') dv = Y_var - X_var if not dv: return 2.0 * NormalDist(fabs(Y.mu - X.mu), 2.0 * X.sigma).cdf(0) a = X.mu * Y_var - Y.mu * X_var b = X.sigma * Y.sigma * sqrt((X.mu - Y.mu)*2 + dv log(Y_var / X_var)) x1 = (a + b) / dv x2 = (a - b) / dv return 1.0 - (fabs(Y.cdf(x1) - X.cdf(x1)) + fabs(Y.cdf(x2) - X.cdf(x2))) ---- Future ---- The concept of an overlap coefficient (OVL) is not specific to normal distributions, so it is possible to extend this idea to work with other distributions if needed.
msg337026 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2019-03-03 06:16
Another cross-check can be had with this nomogram: https://www.rasch.org/rmt/rmt101r.htm
msg337367 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2019-03-07 06:59
New changeset 318d537daabf2bd5f781255c7e25bfce260cf227 by Raymond Hettinger in branch 'master': bpo-36169 : Add overlap() method to statistics.NormalDist (GH-12149) https://github.com/python/cpython/commit/318d537daabf2bd5f781255c7e25bfce260cf227

History
Date	User	Action	Args
2022-04-11 14:59:11	admin	set	github: 80350
2019-03-07 07:00:03	rhettinger	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2019-03-07 06:59:43	rhettinger	set	messages: + msg337367
2019-03-03 21:58:37	rhettinger	set	keywords: + patch stage: patch review pull_requests: + pull_request12149
2019-03-03 06:16:31	rhettinger	set	messages: + msg337026
2019-03-02 23:04:32	rhettinger	create