Issue 7632: dtoa.c: oversize b in quorem, and a menagerie of other bugs

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/51881

classification

Title:	dtoa.c: oversize b in quorem, and a menagerie of other bugs
Type:	crash	Stage:	needs patch
Components:	Interpreter Core	Versions:	Python 3.1, Python 3.2, Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	mark.dickinson	Nosy List:	benjamin.peterson, eric.smith, georg.brandl, mark.dickinson, skrah, tim.peters
Priority:	release blocker	Keywords:	patch

Created on 2010-01-04 12:50 by skrah, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue7632.patch	mark.dickinson, 2010-01-08 17:10
issue7632_v2.patch	mark.dickinson, 2010-01-09 16:47
test_dtoa.py	mark.dickinson, 2010-01-09 16:54	Random tests for string -> float conversion
issue7632_bug8.patch	mark.dickinson, 2010-01-15 21:21	Patch for the release blocker
dtoa_detect_leaks.patch	mark.dickinson, 2010-01-17 13:58	Detect leaks from dtoa and strtod.
memory_debugger.diff	skrah, 2010-01-17 15:41

Messages (41)
msg97205 - (view)	Author: Stefan Krah (skrah) *	Date: 2010-01-04 12:50
In a debug build: Python 3.2a0 (py3k:76671M, Dec 22 2009, 19:41:08) [GCC 4.1.3 20080623 (prerelease) (Ubuntu 4.1.2-23ubuntu3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> s = "2183167012312112312312.23538020374420446192e-370" [30473 refs] >>> f = float(s) oversize b in quorem
msg97206 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-04 12:57
Nice catch! I'll take a look. We should find out whether this is something that happens with Gay's original code, or whether it was introduced in the process of adapting that code for Python.
msg97209 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-04 13:23
I can reproduce this on OS X 10.6 (64-bit), both in py3k and trunk debug builds. In non-debug builds it appears to return the correct result (0.0), so the oversize b appears to have no ill effects. So this may just be an overeager assert; it may be a symptom of a deeper problem, though.
msg97210 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2010-01-04 13:37
I'm testing on a Fedora Core 6 i386 box and an Intel Mac 32-bit 10.5 box. I only see this on debug builds. I've tested trunk, py3k, release31-maint, and release26-maint (just for giggles). The error shows up in debug builds of trunk, py3k, and release31-maint on both machines, and does not show up in non-debug builds.
msg97226 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-04 22:14
The bug is present in the current version of dtoa.c from http://www.netlib.org/fp, so I'll report it upstream. As far as I can tell, though, it's benign, in the sense that if the check is disabled then nothing bad happens, and the correct result is eventually returned (albeit after some unnecessary computation). I suspect that the problem is in the if block around lines 1531--1543 of Python/dtoa.c: a subnormal rv isn't being handled correctly here---it should end up being set to 0.0, but is instead set to 2**-968.
msg97417 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-08 17:10
Here's a patch that seems to fix the problem; I'll wait a while to see if I get a response from David Gay before applying this. Also, if we've got to the stage of modifying the algorithmic part of the original dtoa.c, we should really make sure that we've got our own set of comprehensive tests.
msg97439 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-08 20:44
Randomised testing quickly turned up another troublesome string for str -> float conversion: s = "94393431193180696942841837085033647913224148539854e-358" This one's actually giving incorrectly rounded results (the horror!) in a non-debug build of trunk, and giving the same 'oversize b in quorem' in a debug build. With the patch, it doesn't give the 'oversize b' error, but does still give incorrect results. Python 2.7a1+ (trunk:77375, Jan 8 2010, 20:33:59) [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> s = "94393431193180696942841837085033647913224148539854e-358" >>> float(s) # result of dtoa.c 9.439343119318067e-309 >>> from __future__ import division >>> int(s[:-5])/10*358 # result via (correctly rounded) division 9.43934311931807e-309 I also double checked this value using a simple pure Python implementation of strtod, and using MPFR (via the Python bigfloat module), with the same result: >>> from test_dtoa import strtod >>> strtod(s) # result via a simple pure Python implementation of strtod 9.43934311931807e-309 >>> from bigfloat import >>> with double_precision: x = float(BigFloat(s)) >>> x # result from MPFR, via the bigfloat module 9.43934311931807e-309
msg97443 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-08 22:18
Okay, I think I've found the cause of the second rounding bug above: at the end of the bigcomp function there's a correction block that looks like ... else if (dd < 0) { if (!dsign) /* does not happen for round-near */ retlow1: dval(rv) -= ulp(rv); } else if (dd > 0) { if (dsign) { rethi1: dval(rv) += ulp(rv); } } else ... The problem is that the += and -= corrections don't take into account the possibility that bc->scale is nonzero, and for the case where the scaled rv is subnormal, they'll typically have no effect. I'll work on a fix... tomorrow.
msg97458 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-09 16:47
Second patch, adding a fix for the rounding bug to the first patch.
msg97459 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-09 16:54
Here's the (rather crude) testing program that turned up these errors.
msg97524 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-10 17:34
One more incorrectly rounded result, this time for a normal number: AssertionError: Incorrectly rounded str->float conversion for 99999999999999994487665465554760717039532578546e-47: expected 0x1.0000000000000p+0, got 0x1.fffffffffffffp-1
msg97544 - (view)	Author: Tim Peters (tim.peters) *	Date: 2010-01-10 19:43
Showing once again that a proof of FP code correctness is about as compelling as a proof of God's ontological status ;-) Still, have to express surprised admiration for 99999999999999994487665465554760717039532578546e-47! That one's not even close to being a "hard" case.
msg97552 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-10 21:53
> Showing once again that a proof of FP code correctness is about as > compelling as a proof of God's ontological status ;-) Clearly we need a 1000-page Isabelle/HOL-style machine-checked formal proof, rather than a ten-page TeX proof. Any takers? All of the above bugs seem to have been introduced with the new 'bigcomp' code that arrived on March 16, 2009, just a couple of weeks before I downloaded the version that got adapted for Python; in retrospect, I probably should have used the NO_STRTOD_BIGCOMP #define to bypass the new code.
msg97649 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-12 18:39
Progress report: I've had a response, and fix, from David Gay for the first 2 bugs (Stefan's original bug and the incorrect subnormal result); I'm still arguing with him about a 3rd one (not reported here; there's some possibly incorrect code in bigcomp that probably never actually gets called). I reported the 4th bug (the incorrect rounding for values near 1) to him today. In the mean time, here's bug number 5, found by eyeballing the bigcomp code until it surrendered. :-) >>> 1000000000000000000000000000000000000000e-16 1e+23 >>> 10000000000000000000000000000000000000000e-17 1.0000000000000001e+23
msg97667 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-12 22:25
Fixed the crash that Stefan originally reported in r77450. That revision also removes the 'possibly incorrect code in bigcomp that probably never actually gets called'.
msg97670 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-12 22:56
Second bug fixed in r77451 (trunk), using a fix from David Gay, modified slightly for correctness.
msg97672 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-12 23:09
Merged fixes so far, and a couple of other cleanups, to py3k in r77452, and release31-maint in r77453.
msg97741 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-13 22:37
Just so I don't forget, there are a couple more places in the dtoa.c that look suspicious and need to be checked; I haven't tried to generate failures for them yet. Since we're up to bug 5, I'll number these 6 and 7: (6) at the end of bigcomp, when applying the round-to-even rule for halfway cases, the lsb of rv is checked. This looks wrong if bc->scale is nonzero. (7) In the main strtod correction loop, after computing delta and i, there's a block: bc.nd = nd; i = -1; /* Discarded digits make delta smaller. */ This logic seems invalid if all the discarded digits are zero. (This is the same logic error as is causing bug5: the bigcomp comparison code also assumes incorrectly that digit nd-1 of s0 is nonzero.)
msg97763 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-14 13:17
Bug 6 is indeed a bug: an example incorrectly-rounded string is: '104308485241983990666713401708072175773165034278685682646111762292409330928739751702404658197872319129036519947435319418387839758990478549477777586673075945844895981012024387992135617064532141489278815239849108105951619997829153633535314849999674266169258928940692239684771590065027025835804863585454872499320500023126142553932654370362024104462255244034053203998964360882487378334860197725139151265590832887433736189468858614521708567646743455601905935595381852723723645799866672558576993978025033590728687206296379801363024094048327273913079612469982585674824156000783167963081616214710691759864332339239688734656548790656486646106983450809073750535624894296242072010195710276073042036425579852459556183541199012652571123898996574563824424330960027873516082763671875e-1075' It's fixed in r77491. I'll add tests once the remaining (known) dtoa.c bugs are fixed.
msg97767 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-14 14:44
Bug 4 fixed in r77492. This just leaves bugs 5 and 7; I have a fix for these in the works.
msg97770 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-14 15:23
Tests committed in r77493.
msg97771 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-14 15:44
Fixes and tests so far merged to py3k in r77494, release31-maint in r77496.
msg97814 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-15 15:20
I was considering downgrading this to 'normal'. Then I found Bug 8, and it's a biggie: >>> 10.900000000000000012345678912345678912345 10.0 Now I'm thinking it should be upgraded to release blocker instead. The cause is in the _Py_strtod block that starts: 'if (nd > STRTOD_DIGLIM) {'... It truncates the input to 18 digits, and then deletes trailing zeros. But the code that deletes the zeros is buggy, and passes over the digit '9' just before the point.
msg97815 - (view)	Author: Tim Peters (tim.peters) *	Date: 2010-01-15 15:28
Mark, I agree that last one should be a release blocker -- it's truly dreadful. BTW, did you guess in advance just how many bugs there could be in this kind of code? I did ;-)
msg97816 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-15 15:32
Upgrading to release blocker. It'll almost certainly be fixed before the weekend is out. (And I will, of course, report it upstream.)
msg97850 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-15 21:21
Here's a patch for the release blocker. Eric, would you be interested in double checking the logic for this patch? Tim: No, I have to admit I didn't forsee quite this number of bugs. :)
msg97851 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-15 21:26
issue7632_bug8.patch uploaded to Rietveld: http://codereview.appspot.com/186168
msg97852 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2010-01-15 22:09
It looks correct to me, assuming this comment is correct: /* scan back until we hit a nonzero digit. significant digit 'i' is s0[i] if i < nd0, s0[i+1] if i >= nd0. */ I didn't verify the comment itself.
msg97857 - (view)	Author: Eric V. Smith (eric.smith) *	Date: 2010-01-15 22:42
I have a few minor comments posted on Rietveld, but nothing that would keep you from checking this in.
msg97874 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-16 12:14
Applied the bug 8 patch in r77519 (thanks Eric for reviewing!). For safety, I'll leave this as a release blocker until fixes have been merged to py3k and release31-maint. I've uploaded a fix for bugs 5 and 7 to Rietveld: http://codereview.appspot.com/186182 I still don't like the parsing code much: I'm tempted to pull out the calculation of y and z and do it after the parsing is complete. It's probably marginally less efficient that way, but it would help make the code clearer.
msg97888 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-16 18:00
I've applied a minimal fix for bugs 5 and 7 in r77530 (trunk). (I wasn't able to produce any strings that trigger bug 7, so it may not technically be a bug.) I'm continuing to review, comment, and clean up the remainder of the _Py_dg_strtod.
msg97889 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-16 18:15
Fixes merged to py3k and release31-maint in r77535 and r77537.
msg97907 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-16 21:31
One of the buildbots just produced a MemoryError from test_strtod: http://www.python.org/dev/buildbot/all/builders/i386%20Ubuntu%203.x/builds/411 It looks as though there's a memory leak somewhere in dtoa.c. It's a bit difficult to tell, though, since the memory allocation functions in that file deliberately hold on to small pieces of memory.
msg97914 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-16 22:38
Okay, so there's a memory leak for overflowing values: if an overflow is detected in the main correction loop of _Py_dg_strtod, then 'references' to bd0, bd, bb, bs and delta aren't released. There may be other leaks; I'm trying to come up with a good way to detect them reliably.
msg97915 - (view)	Author: Stefan Krah (skrah) *	Date: 2010-01-16 22:43
This is what Valgrind complains about: ==4750== 3,456 (1,440 direct, 2,016 indirect) bytes in 30 blocks are definitely lost in loss record 3,302 of 3,430 ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195) ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740) ==4750== by 0x4C03CF: Balloc (dtoa.c:352) ==4750== by 0x4C286E: _Py_dg_strtod (dtoa.c:1675) ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103) ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345) ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192) ==4750== by 0x546E74: float_new (floatobject.c:1569) ==4750== by 0x42B5C9: type_call (typeobject.c:664) ==4750== by 0x516442: PyObject_Call (abstract.c:2160) ==4750== by 0x47FDAE: do_call (ceval.c:4088) ==4750== by 0x47F1CF: call_function (ceval.c:3891) ==4750== 9,680 bytes in 242 blocks are still reachable in loss record 3,369 of 3,430 ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195) ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740) ==4750== by 0x4C03CF: Balloc (dtoa.c:352) ==4750== by 0x4C0875: i2b (dtoa.c:556) ==4750== by 0x4C2906: _Py_dg_strtod (dtoa.c:1687) ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103) ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345) ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192) ==4750== by 0x546E74: float_new (floatobject.c:1569) ==4750== by 0x42B5C9: type_call (typeobject.c:664) ==4750== by 0x516442: PyObject_Call (abstract.c:2160) ==4750== by 0x47FDAE: do_call (ceval.c:4088) ==4750== 270,720 bytes in 1,692 blocks are indirectly lost in loss record 3,423 of 3,430 ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195) ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740) ==4750== by 0x4C03CF: Balloc (dtoa.c:352) ==4750== by 0x4C0F97: diff (dtoa.c:825) ==4750== by 0x4C2BED: _Py_dg_strtod (dtoa.c:1779) ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103) ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345) ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192) ==4750== by 0x546E74: float_new (floatobject.c:1569) ==4750== by 0x42B5C9: type_call (typeobject.c:664) ==4750== by 0x516442: PyObject_Call (abstract.c:2160) ==4750== by 0x47FDAE: do_call (ceval.c:4088) ==4750== 382,080 bytes in 2,388 blocks are indirectly lost in loss record 3,424 of 3,430 ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195) ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740) ==4750== by 0x4C03CF: Balloc (dtoa.c:352) ==4750== by 0x4C0C82: lshift (dtoa.c:730) ==4750== by 0x4C2BA9: _Py_dg_strtod (dtoa.c:1771) ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103) ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345) ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192) ==4750== by 0x546E74: float_new (floatobject.c:1569) ==4750== by 0x42B5C9: type_call (typeobject.c:664) ==4750== by 0x516442: PyObject_Call (abstract.c:2160) ==4750== by 0x47FDAE: do_call (ceval.c:4088) ==4750== 414,560 bytes in 2,591 blocks are indirectly lost in loss record 3,425 of 3,430 ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195) ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740) ==4750== by 0x4C03CF: Balloc (dtoa.c:352) ==4750== by 0x4C0C82: lshift (dtoa.c:730) ==4750== by 0x4C2AD1: _Py_dg_strtod (dtoa.c:1744) ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103) ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345) ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192) ==4750== by 0x546E74: float_new (floatobject.c:1569) ==4750== by 0x42B5C9: type_call (typeobject.c:664) ==4750== by 0x516442: PyObject_Call (abstract.c:2160) ==4750== by 0x47FDAE: do_call (ceval.c:4088) ==4750== 414,960 (414,768 direct, 192 indirect) bytes in 2,604 blocks are definitely lost in loss record 3,426 of 3,430 ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195) ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740) ==4750== by 0x4C03CF: Balloc (dtoa.c:352) ==4750== by 0x4C0929: mult (dtoa.c:592) ==4750== by 0x4C0B90: pow5mult (dtoa.c:691) ==4750== by 0x4C2B1A: _Py_dg_strtod (dtoa.c:1753) ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103) ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345) ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192) ==4750== by 0x546E74: float_new (floatobject.c:1569) ==4750== by 0x42B5C9: type_call (typeobject.c:664) ==4750== by 0x516442: PyObject_Call (abstract.c:2160) ==4750== 890,720 (532,960 direct, 357,760 indirect) bytes in 3,331 blocks are definitely lost in loss record 3,428 of 3,430 ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195) ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740) ==4750== by 0x4C03CF: Balloc (dtoa.c:352) ==4750== by 0x4C0C82: lshift (dtoa.c:730) ==4750== by 0x4C2AD1: _Py_dg_strtod (dtoa.c:1744) ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103) ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345) ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192) ==4750== by 0x546E74: float_new (floatobject.c:1569) ==4750== by 0x42B5C9: type_call (typeobject.c:664) ==4750== by 0x516442: PyObject_Call (abstract.c:2160) ==4750== by 0x47FDAE: do_call (ceval.c:4088) ==4750== 1,021,280 (566,080 direct, 455,200 indirect) bytes in 3,538 blocks are definitely lost in loss record 3,429 of 3,430 ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195) ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740) ==4750== by 0x4C03CF: Balloc (dtoa.c:352) ==4750== by 0x4C0C82: lshift (dtoa.c:730) ==4750== by 0x4C2BA9: _Py_dg_strtod (dtoa.c:1771) ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103) ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345) ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192) ==4750== by 0x546E74: float_new (floatobject.c:1569) ==4750== by 0x42B5C9: type_call (typeobject.c:664) ==4750== by 0x516442: PyObject_Call (abstract.c:2160) ==4750== by 0x47FDAE: do_call (ceval.c:4088) ==4750== 1,465,280 (676,640 direct, 788,640 indirect) bytes in 4,229 blocks are definitely lost in loss record 3,430 of 3,430 ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195) ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740) ==4750== by 0x4C03CF: Balloc (dtoa.c:352) ==4750== by 0x4C0F97: diff (dtoa.c:825) ==4750== by 0x4C2BED: _Py_dg_strtod (dtoa.c:1779) ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103) ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345) ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192) ==4750== by 0x546E74: float_new (floatobject.c:1569) ==4750== by 0x42B5C9: type_call (typeobject.c:664) ==4750== by 0x516442: PyObject_Call (abstract.c:2160) ==4750== by 0x47FDAE: do_call (ceval.c:4088)
msg97919 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-16 23:28
Stefan, thanks for that! I'm not entirely sure how to make use of it, though. Is there a way to tell Valgrind that some leaks are expected? The main problem with leak detection is that dtoa.c deliberately keeps hold of any malloc'ed chunks less than a certain size (which I think is something like 2048 bytes, but I'm not sure). These chunks are never freed in normal use; instead, they're added to a bunch of free lists for the next time that strtod or dtoa is called. The logic isn't too complicated: it's in the functions Balloc and Bfree in dtoa.c. So the right thing to do is just to check that for each call to strtod, the total number of calls to Balloc matches the total number of calls to Bfree with non-NULL argument. And similarly for dtoa, except that in that case one of the Balloc'd blocks gets returned to the caller (it's the caller's responsibility to call free_dtoa to free it when it's no longer needed), so there should be a difference of 1. And there's one further wrinkle: dtoa.c maintains a list of powers of 5 of the form 52k, and this list is automatically extended with newly allocated Bigints when necessary: those Bigints are never freed either, so calls to Balloc from that source should be ignored. Another way round this is just to ignore any leak from the first call to strtod, and then do a repeat call with the same parameters; the second call will already have all the powers of 5 it needs.
msg97920 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-16 23:29
Upgrading to release blocker again: the memory leak should be fixed for 2.7 (and more immediately, for 3.1.2).
msg97945 - (view)	Author: Stefan Krah (skrah) *	Date: 2010-01-17 13:53
Mark, thanks for the explanation! - You can generate suppressions for the Misc/valgrind-python.supp file, but you have to know exactly which errors can be ignored. Going through the Valgrind output again, it looks like most of it is about what you already mentioned (bd0, bd, bb, bs and delta not being released). Would it be much work to provide Valgrind-friendly versions of Balloc, Bfree and pow5mult? Balloc and Bfree are already mentioned in an XXX comment, pow5mult should be a slow version that doesn't cache anything. Perhaps these could be ifdef'd with Py_USING_MEMORY_DEBUGGER.
msg97946 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-17 13:58
Stefan, I'm not particularly familiar with Valgrind: can you tell me what would need to be done? Is a non-caching version of pow5mult all that's required? Here's the patch that I'm using to detect leaks at the moment. (It includes a slow pow5mult version.)
msg97952 - (view)	Author: Stefan Krah (skrah) *	Date: 2010-01-17 15:41
With the latest dtoa.c, your non-caching pow5mult and a quick hack for Balloc and Bfree I get zero (dtoa.c related) Valgrind errors. So the attached memory_debugger.diff is pretty much all what's needed for Valgrind.
msg97973 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2010-01-17 21:09
Thanks, Stefan. Applied in r77589 (trunk), r77590 (py3k), r77591 (release31-maint) with one small change: I moved the freelist and p5s declarations inside the #ifndef Py_USING_MEMORY_DEBUGGER conditionals. The leak itself was fixed in revisions r77578 through r77580; from Stefan's Valgrind report, and my own refcount testing, it looks as though that was the only leak point. I haven't finished reviewing/testing the _Py_dg_strtod code yet, but I'm going to close this issue; if anything new turns up I'll open another one.

History
Date	User	Action	Args
2022-04-11 14:56:56	admin	set	nosy: + benjamin.peterson, georg.brandl github: 51881
2010-01-17 21:09:44	mark.dickinson	set	status: open -> closed resolution: fixed messages: + msg97973
2010-01-17 15:41:59	skrah	set	files: + memory_debugger.diff messages: + msg97952
2010-01-17 13:58:23	mark.dickinson	set	files: + dtoa_detect_leaks.patch messages: + msg97946
2010-01-17 13:53:29	skrah	set	messages: + msg97945
2010-01-16 23:29:02	mark.dickinson	set	priority: normal -> release blocker messages: + msg97920
2010-01-16 23:28:15	mark.dickinson	set	messages: + msg97919
2010-01-16 22:43:56	skrah	set	messages: + msg97915
2010-01-16 22:38:20	mark.dickinson	set	messages: + msg97914
2010-01-16 21:31:16	mark.dickinson	set	messages: + msg97907
2010-01-16 18:15:42	mark.dickinson	set	priority: release blocker -> normal messages: + msg97889
2010-01-16 18:00:25	mark.dickinson	set	messages: + msg97888
2010-01-16 12:14:13	mark.dickinson	set	messages: + msg97874
2010-01-15 22:42:11	eric.smith	set	messages: + msg97857
2010-01-15 22:09:34	eric.smith	set	messages: + msg97852
2010-01-15 21:26:41	mark.dickinson	set	messages: + msg97851
2010-01-15 21:21:38	mark.dickinson	set	files: + issue7632_bug8.patch messages: + msg97850 title: dtoa.c: oversize b in quorem -> dtoa.c: oversize b in quorem, and a menagerie of other bugs
2010-01-15 15:32:14	mark.dickinson	set	priority: high -> release blocker messages: + msg97816
2010-01-15 15:28:44	tim.peters	set	messages: + msg97815
2010-01-15 15:20:39	mark.dickinson	set	messages: + msg97814
2010-01-14 15:44:21	mark.dickinson	set	messages: + msg97771
2010-01-14 15:23:14	mark.dickinson	set	messages: + msg97770
2010-01-14 14:44:51	mark.dickinson	set	messages: + msg97767
2010-01-14 13:17:45	mark.dickinson	set	messages: + msg97763
2010-01-13 22:37:33	mark.dickinson	set	messages: + msg97741
2010-01-12 23:09:41	mark.dickinson	set	messages: + msg97672
2010-01-12 22:56:52	mark.dickinson	set	messages: + msg97670
2010-01-12 22:25:14	mark.dickinson	set	messages: + msg97667
2010-01-12 18:39:08	mark.dickinson	set	messages: + msg97649
2010-01-10 21:53:42	mark.dickinson	set	messages: + msg97552
2010-01-10 19:43:57	tim.peters	set	nosy: + tim.peters messages: + msg97544
2010-01-10 17:34:56	mark.dickinson	set	messages: + msg97524
2010-01-09 16:54:16	mark.dickinson	set	files: + test_dtoa.py messages: + msg97459
2010-01-09 16:47:35	mark.dickinson	set	files: + issue7632_v2.patch messages: + msg97458
2010-01-08 22:18:31	mark.dickinson	set	messages: + msg97443
2010-01-08 20:44:11	mark.dickinson	set	messages: + msg97439 stage: patch review -> needs patch
2010-01-08 17:10:27	mark.dickinson	set	files: + issue7632.patch keywords: + patch messages: + msg97417 stage: needs patch -> patch review
2010-01-04 22:14:49	mark.dickinson	set	messages: + msg97226
2010-01-04 13:37:58	eric.smith	set	messages: + msg97210 versions: + Python 3.1
2010-01-04 13:23:17	mark.dickinson	set	priority: critical -> high messages: + msg97209
2010-01-04 12:57:50	mark.dickinson	set	nosy: mark.dickinson, eric.smith, skrah type: crash components: + Interpreter Core stage: needs patch
2010-01-04 12:57:15	mark.dickinson	set	priority: critical assignee: mark.dickinson messages: + msg97206 versions: + Python 2.7
2010-01-04 12:50:03	skrah	create