Issue 26040: Improve coverage and rigour of test.test_math

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/70228

classification

Title:	Improve coverage and rigour of test.test_math
Type:	behavior	Stage:	resolved
Components:	Tests	Versions:	Python 3.6, Python 3.5, Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	mark.dickinson	Nosy List:	christian.heimes, eric.smith, jeff.allen, koobs, lemburg, mark.dickinson, ned.deily, python-dev, serhiy.storchaka, stutzbach
Priority:	normal	Keywords:	patch

Created on 2016-01-07 15:41 by jeff.allen, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
iss26040.patch	jeff.allen, 2016-03-14 22:57	Patch that improves coverage and addresses the uneven accuracy	review
extra_cmath_testcases.py	jeff.allen, 2016-03-17 20:37	Generate the extra test cases proposed for cmath_testcases.txt
stat_math.py	jeff.allen, 2016-03-17 20:38	Demonstrates coverage and accuracy problems in test_math v3
iss26040_v2.patch	mark.dickinson, 2016-08-29 11:14		review
iss26040_v3.patch	mark.dickinson, 2016-08-29 12:23		review
iss26040_v4.patch	mark.dickinson, 2016-08-29 12:42		review

Messages (20)
msg257695 - (view)	Author: Jeff Allen (jeff.allen) *	Date: 2016-01-07 15:41
In many cases, test_math does not exercise the functions defined by the math module, other than at a few values needing special handling. Where it does test a function with values representative of the domain, the accuracy demanded is often loose in one place and tight in another. By comparison, test_cmath uses more interesting values and a rational approach to accuracy. In most implementations, math simply proxies an expertly built (and tested) platform library or firmware, so it is unlikely we are the victims of mathematical inexactitude. We observed this as a problem in the Jython project, where failures in cmath were traced eventually to the naive (but passing) implementation of some functions in math. For Jython, we now supplement test_math with stronger tests more like those in test_cmath. So this issue is really an offer to fold that work back into the CPython test_math, to strengthen testing for all. Coverage: -------- In the attached program, in the coverge section, I logged the calls to selected functions from test_math to see what range of values they used. It produces results like: sqrt [9.881e-324, 7.905e-323, 1.862e-309, 1e-305, 1e-150, ..., 2, 23, 1e+16, 1e+150, 1e+299] [-1] [-0.0, -inf, 0.0, inf, nan] sin [1.571] [-1.571] [-0.0, -inf, 0.0, inf, nan] exp [] [-745] [-0.0, -inf, 0.0, inf, nan] sinh [] [] [-0.0, -inf, 0.0, inf, nan] The three rows here are positive, negative and "special" arguments, and "..." means I abridged the list. Accuracy: -------- Where functions are tested, results are expected with a maximum error of 1e-5. On sqrt(1e150), this demands bit-perfect accuracy; on sqrt(1e-150) it demands virtually nothing. In the attached program, in the accuracy section, I wrap most of the functions so as to fuzz their results by plus or minus 2e-6, test_math will still pass them. test_cmath uses a relative measure of error (so many ulps of the expected value), which is worth emulating. If people think it better, coverage and accuracy can be tracked as separate issues.
msg261789 - (view)	Author: Jeff Allen (jeff.allen) *	Date: 2016-03-14 22:57
Here is a patch that improves coverage and addresses the uneven accuracy. Required accuracy is now specified in ulps. Mostly, I have choses 1 ulp, since this passed for me on an x86 architecture (and also ARM), but this may be too ambitious. I have also responded to the comment relating to erfc: # XXX Would be better to weaken this test only # for large x, instead of for all x." I found I could not contribute the code I used to generate the additional test cases in Tools/scripts without failing test_tools. (It complained of a missing dependency. The generator uses mpmath.)
msg261855 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-03-16 15:05
Big +1 for the idea, and thank you for doing this work. I can't promise to find time to do a thorough review in the near future (though I'll try).
msg261933 - (view)	Author: Jeff Allen (jeff.allen) *	Date: 2016-03-17 20:51
Thanks for the prompt acknowledgement and for accepting this to review. I have updated the coverage & tolerance demo program. Usage in the comments (in v3). I have also added the program I used to generate the extra test cases (needs mpmath -- easier to get working than mpf in the original Windows/Jython environment).
msg273472 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-08-23 15:59
I finally have some time to look at this. I've double-checked all the new cmath testcases against MPFR (via bigfloat), and they all look good. I plan to apply this in two stages: (1) apply the new cmath testcases, (2) apply the test_math changes, and (1.5) watch for buildbot failures in-between.
msg273484 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-08-23 16:34
New changeset 1017215f5492 by Mark Dickinson in branch 'default': Issue #26040 (part 1): add new testcases to cmath_testcases.txt. Thanks Jeff Allen. https://hg.python.org/cpython/rev/1017215f5492
msg273493 - (view)	Author: Ned Deily (ned.deily) *	Date: 2016-08-23 17:29
Mark, many buildbots are unhappy. For example: ====================================================================== FAIL: test_testfile (test.test_math.MathTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/test/test_math.py", line 1106, in test_testfile self.ftest("%s:%s(%r)" % (id, fn, ar), result, er) File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/test/test_math.py", line 194, in ftest (name, value, expected)) AssertionError: cosh0063:cosh(709.7827) returned 8.988349783319008e+307, expected 8.988349783319007e+307
msg273507 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-08-23 18:55
Thanks, Ned. (And apologies.) The test_math tests are too strict in this case, which is one of the issues that Jeff's full patch fixes. It looks like my strategy of applying the patch in two pieces isn't going to work. I'll revert the change for now, and have another go at this tomorrow.
msg273509 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-08-23 19:02
Reverted in https://hg.python.org/cpython/rev/4389aa3507c5
msg273767 - (view)	Author: Jeff Allen (jeff.allen) *	Date: 2016-08-27 08:56
Mark: Thanks for validating the additional cases so carefully. If you still want to apply it in stages then I suppose the change to the comparison logic could go first (untested idea), although that's also where I could most easily have made a mistake.
msg273853 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-08-29 11:14
iss26040.patch didn't apply cleanly to master (because of the recent math.tau addition). Here's an updated patch.
msg273856 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-08-29 12:23
I've reviewed the test_math portion of the patch (which looks good, thank you!), and made a few revisions in iss26040_v3.patch. 1. I've rewritten the ulp() function to use struct. Same behaviour as before, but this way feels safer, since it doesn't rely on the exact form of the float.hex format. 2. I've reworked the ulp-based comparison to use the old to_ulps function, instead of basing it on ulp(). That's mostly just a personal preference: I prefer the symmetry of the abs(to_ulps(got) - to_ulps(expected)) form, and I think it behaves slightly better at or near powers of two. (If expected == 1.0 and got == 1.0 - 2**-51, I want that to be seen as 4 ulps error rather than 2.) 3. I increased the default ulp tolerance to 5 ulps, and increased the ulps tolerance for some of the special cases. There are some badly behaved libm implementations out there, and I don't want to break more buildbots than necessary.
msg273857 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-08-29 12:42
One more version; increased the default ulp_tol to 5 everywhere, and made some minor style fixes in nearby code.
msg273872 - (view)	Author: Jeff Allen (jeff.allen) *	Date: 2016-08-29 16:56
Mark: Thanks for doing my homework. Points 1 and 3 I can readily agree with. I must take another look at to_ulps() with your patch on locally. I used the approach I did because I thought it was incorrect in exactly those corners where you prefer it. I'll take a closer look.
msg273874 - (view)	Author: Jeff Allen (jeff.allen) *	Date: 2016-08-29 17:54
Ah, cunning: I can make sense of it in hex. >>> hex(to_ulps(expected)) '0x3ff0000000000000' >>> hex(to_ulps(got)) '0x3feffffffffffffc' >>> hex( to_ulps(got) - to_ulps(expected) ) '-0x4' ... and what you've done with ulp then follows. In my version a format like "{:d} ulps" was a bad idea when the error was a gross one, but your to_ulps is only piece-wise linear -- large differences are compressed. I'm pleased my work has mostly survived: here's hoping the house build-bots agree. erfc() is perhaps the last worry, but math & cmath pass on my machine.
msg274320 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-09-03 18:30
New changeset e3dbe8b7279a by Mark Dickinson in branch 'default': Issue #26040: Improve test_math and test_cmath coverage and rigour. Thanks Jeff Allen. https://hg.python.org/cpython/rev/e3dbe8b7279a
msg274321 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-09-03 18:32
iss26040_v4.patch applied. I'm watching the buildbots; if everything looks good there, I'll close the issue.
msg274324 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-09-03 20:24
Buildbots seem happy (or at least, no more unhappy than before). Closing.
msg274346 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-09-04 07:08
> or at least, no more unhappy than before Not quite true: test_math and test_cmath are failing for the x86 Tiger 3.x buildbot: ====================================================================== FAIL: test_testfile (test.test_math.MathTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_math.py", line 1190, in test_testfile '\n '.join(failures)) AssertionError: Failures in test_testfile: tan0064: tan(1.5707963267948961): expected 1978937966095219.0, got 1978945885716843.0 (error = 7.92e+09 (31678486496 ulps); permitted error = 0 or 5 ulps) ====================================================================== FAIL: test_specific_values (test.test_cmath.CMathTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_cmath.py", line 398, in test_specific_values msg=error_message) File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_cmath.py", line 147, in rAssertAlmostEqual '{!r} and {!r} are not sufficiently close'.format(a, b)) AssertionError: tan0064: tan(complex(1.5707963267948961, 0.0)) Expected: complex(1978937966095219.0, 0.0) Received: complex(1978945885716843.0, 0.0) Received value insufficiently close to expected value. It's probably not worth trying to fix this for such an ancient version of OS X. Re-opening; I'll aim to skip the test on the platform.
msg274351 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2016-09-04 08:36
Opened #27953 to track the issue with tan on OS X Tiger. Re-closing here.

History
Date	User	Action	Args
2022-04-11 14:58:25	admin	set	github: 70228
2016-09-04 08:36:23	mark.dickinson	set	status: open -> closed messages: + msg274351
2016-09-04 07:08:11	mark.dickinson	set	status: closed -> open messages: + msg274346
2016-09-03 20:24:00	mark.dickinson	set	status: open -> closed resolution: fixed messages: + msg274324 stage: resolved
2016-09-03 18:32:39	mark.dickinson	set	messages: + msg274321
2016-09-03 18:30:36	python-dev	set	messages: + msg274320
2016-08-29 17:54:16	jeff.allen	set	messages: + msg273874
2016-08-29 16:56:23	jeff.allen	set	messages: + msg273872
2016-08-29 12:42:51	mark.dickinson	set	files: + iss26040_v4.patch messages: + msg273857
2016-08-29 12:23:29	mark.dickinson	set	files: + iss26040_v3.patch messages: + msg273856
2016-08-29 11:14:14	mark.dickinson	set	files: + iss26040_v2.patch messages: + msg273853
2016-08-27 08:56:21	jeff.allen	set	messages: + msg273767
2016-08-23 19:02:33	mark.dickinson	set	messages: + msg273509
2016-08-23 18:55:57	mark.dickinson	set	messages: + msg273507
2016-08-23 17:29:35	ned.deily	set	nosy: + ned.deily messages: + msg273493
2016-08-23 17:24:27	koobs	set	nosy: + koobs
2016-08-23 16:34:06	python-dev	set	nosy: + python-dev messages: + msg273484
2016-08-23 15:59:23	mark.dickinson	set	messages: + msg273472
2016-08-23 15:00:35	mark.dickinson	set	assignee: mark.dickinson
2016-03-17 20:51:12	jeff.allen	set	messages: + msg261933
2016-03-17 20:38:50	jeff.allen	set	files: - stat_math.py
2016-03-17 20:38:36	jeff.allen	set	files: + stat_math.py
2016-03-17 20:37:52	jeff.allen	set	files: + extra_cmath_testcases.py
2016-03-17 19:52:24	jeff.allen	set	files: - stat_math.py
2016-03-17 19:51:33	jeff.allen	set	files: + stat_math.py
2016-03-16 15:05:02	mark.dickinson	set	messages: + msg261855
2016-03-15 13:31:41	serhiy.storchaka	set	nosy: + lemburg, eric.smith, christian.heimes, stutzbach, serhiy.storchaka versions: + Python 3.5
2016-03-14 22:57:11	jeff.allen	set	files: + iss26040.patch keywords: + patch messages: + msg261789
2016-01-07 15:41:22	jeff.allen	create