classification
Title: dtoa.c: oversize b in quorem, and a menagerie of other bugs
Type: crash Stage: needs patch
Components: Interpreter Core Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: mark.dickinson Nosy List: eric.smith, mark.dickinson, skrah, tim.peters
Priority: release blocker Keywords: patch

Created on 2010-01-04 12:50 by skrah, last changed 2010-01-17 21:09 by mark.dickinson. This issue is now closed.

Files
File name Uploaded Description Edit
issue7632.patch mark.dickinson, 2010-01-08 17:10
issue7632_v2.patch mark.dickinson, 2010-01-09 16:47
test_dtoa.py mark.dickinson, 2010-01-09 16:54 Random tests for string -> float conversion
issue7632_bug8.patch mark.dickinson, 2010-01-15 21:21 Patch for the release blocker
dtoa_detect_leaks.patch mark.dickinson, 2010-01-17 13:58 Detect leaks from dtoa and strtod.
memory_debugger.diff skrah, 2010-01-17 15:41
Messages (41)
msg97205 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-01-04 12:50
In a debug build:

Python 3.2a0 (py3k:76671M, Dec 22 2009, 19:41:08) 
[GCC 4.1.3 20080623 (prerelease) (Ubuntu 4.1.2-23ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = "2183167012312112312312.23538020374420446192e-370"
[30473 refs]
>>> f = float(s)
oversize b in quorem
msg97206 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-04 12:57
Nice catch!  I'll take a look.  We should find out whether this is something that happens with Gay's original code, or whether it was introduced in the process of adapting that code for Python.
msg97209 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-04 13:23
I can reproduce this on OS X 10.6 (64-bit), both in py3k and trunk debug builds.  In non-debug builds it appears to return the correct result (0.0), so the oversize b appears to have no ill effects.  So this may just be an overeager assert;  it may be a symptom of a deeper problem, though.
msg97210 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-01-04 13:37
I'm testing on a Fedora Core 6 i386 box and an Intel Mac 32-bit 10.5 box. I only see this on debug builds. I've tested trunk, py3k, release31-maint, and release26-maint (just for giggles).

The error shows up in debug builds of trunk, py3k, and release31-maint on both machines, and does not show up in non-debug builds.
msg97226 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-04 22:14
The bug is present in the current version of dtoa.c from http://www.netlib.org/fp, so I'll report it upstream.  As far as I can tell, though, it's benign, in the sense that if the check is disabled then nothing bad happens, and the correct result is eventually returned (albeit after some unnecessary computation).

I suspect that the problem is in the if block around lines 1531--1543 of Python/dtoa.c:  a subnormal rv isn't being handled correctly here---it should end up being set to 0.0, but is instead set to 2**-968.
msg97417 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-08 17:10
Here's a patch that seems to fix the problem;  I'll wait a while to see if I get a response from David Gay before applying this.

Also, if we've got to the stage of modifying the algorithmic part of the original dtoa.c, we should really make sure that we've got our own set of comprehensive tests.
msg97439 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-08 20:44
Randomised testing quickly turned up another troublesome string for str -> float conversion:

s = "94393431193180696942841837085033647913224148539854e-358"

This one's actually giving incorrectly rounded results (the horror!) in a non-debug build of trunk, and giving the same 'oversize b in quorem' in a debug build.  With the patch, it doesn't give the 'oversize b' error, but does still give incorrect results.

Python 2.7a1+ (trunk:77375, Jan  8 2010, 20:33:59) 
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> s = "94393431193180696942841837085033647913224148539854e-358"
>>> float(s)   # result of dtoa.c
9.439343119318067e-309
>>> from __future__ import division
>>> int(s[:-5])/10**358  # result via (correctly rounded) division
9.43934311931807e-309

I also double checked this value using a simple pure Python implementation of strtod, and using MPFR (via the Python bigfloat module), with the same result:

>>> from test_dtoa import strtod
>>> strtod(s)  # result via a simple pure Python implementation of strtod
9.43934311931807e-309
>>> from bigfloat import *
>>> with double_precision: x = float(BigFloat(s))
>>> x   # result from MPFR, via the bigfloat module
9.43934311931807e-309
msg97443 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-08 22:18
Okay, I think I've found the cause of the second rounding bug above: at the end of the bigcomp function there's a correction block that looks like

    ...
    else if (dd < 0) {
        if (!dsign)     /* does not happen for round-near */
          retlow1:
            dval(rv) -= ulp(rv);
    }
    else if (dd > 0) {
        if (dsign) {
          rethi1:
            dval(rv) += ulp(rv);
        }
    }
    else ...

The problem is that the += and -= corrections don't take into account the possibility that bc->scale is nonzero, and for the case where the scaled rv is subnormal, they'll typically have no effect.

I'll work on a fix...  tomorrow.
msg97458 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-09 16:47
Second patch, adding a fix for the rounding bug to the first patch.
msg97459 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-09 16:54
Here's the (rather crude) testing program that turned up these errors.
msg97524 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-10 17:34
One more incorrectly rounded result, this time for a normal number:

AssertionError: Incorrectly rounded str->float conversion for 99999999999999994487665465554760717039532578546e-47: expected 0x1.0000000000000p+0, got 0x1.fffffffffffffp-1
msg97544 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2010-01-10 19:43
Showing once again that a proof of FP code correctness is about as compelling as a proof of God's ontological status ;-)

Still, have to express surprised admiration for 99999999999999994487665465554760717039532578546e-47!  That one's not even close to being a "hard" case.
msg97552 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-10 21:53
> Showing once again that a proof of FP code correctness is about as
> compelling as a proof of God's ontological status ;-)

Clearly we need a 1000-page Isabelle/HOL-style machine-checked formal proof, rather than a ten-page TeX proof.  Any takers?

All of the above bugs seem to have been introduced with the new 'bigcomp' code that arrived on March 16, 2009, just a couple of weeks before I downloaded the version that got adapted for Python;  in retrospect, I probably should have used the NO_STRTOD_BIGCOMP #define to bypass the new code.
msg97649 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-12 18:39
Progress report:  I've had a response, and fix, from David Gay for the first 2 bugs (Stefan's original bug and the incorrect subnormal result);  I'm still arguing with him about a 3rd one (not reported here; there's some possibly incorrect code in bigcomp that probably never actually gets called).  I reported the 4th bug (the incorrect rounding for values near 1) to him today.  In the mean time, here's bug number 5, found by eyeballing the bigcomp code until it surrendered.  :-)

>>> 1000000000000000000000000000000000000000e-16
1e+23
>>> 10000000000000000000000000000000000000000e-17
1.0000000000000001e+23
msg97667 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-12 22:25
Fixed the crash that Stefan originally reported in r77450.  That revision also removes the 'possibly incorrect code in bigcomp that probably never actually gets called'.
msg97670 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-12 22:56
Second bug fixed in r77451 (trunk), using a fix from David Gay, modified slightly for correctness.
msg97672 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-12 23:09
Merged fixes so far, and a couple of other cleanups, to py3k in r77452, and release31-maint in r77453.
msg97741 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-13 22:37
Just so I don't forget, there are a couple more places in the dtoa.c that look suspicious and need to be checked;  I haven't tried to generate failures for them yet.  Since we're up to bug 5, I'll number these 6 and 7:

(6) at the end of bigcomp, when applying the round-to-even rule for halfway cases, the lsb of rv is checked.  This looks wrong if bc->scale is nonzero.

(7) In the main strtod correction loop, after computing delta and i, there's a block:

                bc.nd = nd;
                i = -1; /* Discarded digits make delta smaller. */

This logic seems invalid if all the discarded digits are zero.  (This is the same logic error as is causing bug5:  the bigcomp comparison code also assumes incorrectly that digit nd-1 of s0 is nonzero.)
msg97763 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-14 13:17
Bug 6 is indeed a bug: an example incorrectly-rounded string is:
        '104308485241983990666713401708072175773165034278685682646111762292409330928739751702404658197872319129036519947435319418387839758990478549477777586673075945844895981012024387992135617064532141489278815239849108105951619997829153633535314849999674266169258928940692239684771590065027025835804863585454872499320500023126142553932654370362024104462255244034053203998964360882487378334860197725139151265590832887433736189468858614521708567646743455601905935595381852723723645799866672558576993978025033590728687206296379801363024094048327273913079612469982585674824156000783167963081616214710691759864332339239688734656548790656486646106983450809073750535624894296242072010195710276073042036425579852459556183541199012652571123898996574563824424330960027873516082763671875e-1075'

It's fixed in r77491.  I'll add tests once the remaining (known) dtoa.c bugs are fixed.
msg97767 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-14 14:44
Bug 4 fixed in r77492.  This just leaves bugs 5 and 7;  I have a fix for these in the works.
msg97770 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-14 15:23
Tests committed in r77493.
msg97771 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-14 15:44
Fixes and tests so far merged to py3k in r77494, release31-maint in r77496.
msg97814 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-15 15:20
I was considering downgrading this to 'normal'.  Then I found Bug 8, and it's a biggie:

>>> 10.900000000000000012345678912345678912345
10.0

Now I'm thinking it should be upgraded to release blocker instead.

The cause is in the _Py_strtod block that starts: 'if (nd > STRTOD_DIGLIM) {'...  It truncates the input to 18 digits, and then deletes trailing zeros.  But the code that deletes the zeros is buggy, and passes over the digit '9' just before the point.
msg97815 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2010-01-15 15:28
Mark, I agree that last one should be a release blocker -- it's truly dreadful.

BTW, did you guess in advance just how many bugs there could be in this kind of code?  I did ;-)
msg97816 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-15 15:32
Upgrading to release blocker.  It'll almost certainly be fixed before the weekend is out.  (And I will, of course, report it upstream.)
msg97850 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-15 21:21
Here's a patch for the release blocker.

Eric, would you be interested in double checking the logic for this patch? 

Tim:  No, I have to admit I didn't forsee quite this number of bugs.  :)
msg97851 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-15 21:26
issue7632_bug8.patch uploaded to Rietveld:

http://codereview.appspot.com/186168
msg97852 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-01-15 22:09
It looks correct to me, assuming this comment is correct:

     /* scan back until we hit a nonzero digit.  significant digit 'i'
        is s0[i] if i < nd0, s0[i+1] if i >= nd0. */

I didn't verify the comment itself.
msg97857 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-01-15 22:42
I have a few minor comments posted on Rietveld, but nothing that would keep you from checking this in.
msg97874 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-16 12:14
Applied the bug 8 patch in r77519 (thanks Eric for reviewing!).  For safety, I'll leave this as a release blocker until fixes have been merged to py3k and release31-maint.

I've uploaded a fix for bugs 5 and 7 to Rietveld:

http://codereview.appspot.com/186182

I still don't like the parsing code much:  I'm tempted to pull out the calculation of y and z and do it after the parsing is complete.  It's probably marginally less efficient that way, but it would help make the code clearer.
msg97888 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-16 18:00
I've applied a minimal fix for bugs 5 and 7 in r77530 (trunk).  (I wasn't able to produce any strings that trigger bug 7, so it may not technically be a bug.)

I'm continuing to review, comment, and clean up the remainder of the _Py_dg_strtod.
msg97889 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-16 18:15
Fixes merged to py3k and release31-maint in r77535 and r77537.
msg97907 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-16 21:31
One of the buildbots just produced a MemoryError from test_strtod:

http://www.python.org/dev/buildbot/all/builders/i386%20Ubuntu%203.x/builds/411

It looks as though there's a memory leak somewhere in dtoa.c.  It's a bit difficult to tell, though, since the memory allocation functions in that file deliberately hold on to small pieces of memory.
msg97914 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-16 22:38
Okay, so there's a memory leak for overflowing values:  if an overflow is detected in the main correction loop of _Py_dg_strtod, then 'references' to bd0, bd, bb, bs and delta aren't released.

There may be other leaks;  I'm trying to come up with a good way to detect them reliably.
msg97915 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-01-16 22:43
This is what Valgrind complains about:

==4750== 3,456 (1,440 direct, 2,016 indirect) bytes in 30 blocks are definitely lost in loss record 3,302 of 3,430
==4750==    at 0x4C2412C: malloc (vg_replace_malloc.c:195)
==4750==    by 0x41B7B5: PyMem_Malloc (object.c:1740)
==4750==    by 0x4C03CF: Balloc (dtoa.c:352)
==4750==    by 0x4C286E: _Py_dg_strtod (dtoa.c:1675)
==4750==    by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
==4750==    by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
==4750==    by 0x543968: PyFloat_FromString (floatobject.c:192)
==4750==    by 0x546E74: float_new (floatobject.c:1569)
==4750==    by 0x42B5C9: type_call (typeobject.c:664)
==4750==    by 0x516442: PyObject_Call (abstract.c:2160)
==4750==    by 0x47FDAE: do_call (ceval.c:4088)
==4750==    by 0x47F1CF: call_function (ceval.c:3891)

==4750== 9,680 bytes in 242 blocks are still reachable in loss record 3,369 of 3,430
==4750==    at 0x4C2412C: malloc (vg_replace_malloc.c:195)
==4750==    by 0x41B7B5: PyMem_Malloc (object.c:1740)
==4750==    by 0x4C03CF: Balloc (dtoa.c:352)
==4750==    by 0x4C0875: i2b (dtoa.c:556)
==4750==    by 0x4C2906: _Py_dg_strtod (dtoa.c:1687)
==4750==    by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
==4750==    by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
==4750==    by 0x543968: PyFloat_FromString (floatobject.c:192)
==4750==    by 0x546E74: float_new (floatobject.c:1569)
==4750==    by 0x42B5C9: type_call (typeobject.c:664)
==4750==    by 0x516442: PyObject_Call (abstract.c:2160)
==4750==    by 0x47FDAE: do_call (ceval.c:4088)

==4750== 270,720 bytes in 1,692 blocks are indirectly lost in loss record 3,423 of 3,430
==4750==    at 0x4C2412C: malloc (vg_replace_malloc.c:195)
==4750==    by 0x41B7B5: PyMem_Malloc (object.c:1740)
==4750==    by 0x4C03CF: Balloc (dtoa.c:352)
==4750==    by 0x4C0F97: diff (dtoa.c:825)
==4750==    by 0x4C2BED: _Py_dg_strtod (dtoa.c:1779)
==4750==    by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
==4750==    by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
==4750==    by 0x543968: PyFloat_FromString (floatobject.c:192)
==4750==    by 0x546E74: float_new (floatobject.c:1569)
==4750==    by 0x42B5C9: type_call (typeobject.c:664)
==4750==    by 0x516442: PyObject_Call (abstract.c:2160)
==4750==    by 0x47FDAE: do_call (ceval.c:4088)

==4750== 382,080 bytes in 2,388 blocks are indirectly lost in loss record 3,424 of 3,430
==4750==    at 0x4C2412C: malloc (vg_replace_malloc.c:195)
==4750==    by 0x41B7B5: PyMem_Malloc (object.c:1740)
==4750==    by 0x4C03CF: Balloc (dtoa.c:352)
==4750==    by 0x4C0C82: lshift (dtoa.c:730)
==4750==    by 0x4C2BA9: _Py_dg_strtod (dtoa.c:1771)
==4750==    by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
==4750==    by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
==4750==    by 0x543968: PyFloat_FromString (floatobject.c:192)
==4750==    by 0x546E74: float_new (floatobject.c:1569)
==4750==    by 0x42B5C9: type_call (typeobject.c:664)
==4750==    by 0x516442: PyObject_Call (abstract.c:2160)
==4750==    by 0x47FDAE: do_call (ceval.c:4088)

==4750== 414,560 bytes in 2,591 blocks are indirectly lost in loss record 3,425 of 3,430
==4750==    at 0x4C2412C: malloc (vg_replace_malloc.c:195)
==4750==    by 0x41B7B5: PyMem_Malloc (object.c:1740)
==4750==    by 0x4C03CF: Balloc (dtoa.c:352)
==4750==    by 0x4C0C82: lshift (dtoa.c:730)
==4750==    by 0x4C2AD1: _Py_dg_strtod (dtoa.c:1744)
==4750==    by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
==4750==    by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
==4750==    by 0x543968: PyFloat_FromString (floatobject.c:192)
==4750==    by 0x546E74: float_new (floatobject.c:1569)
==4750==    by 0x42B5C9: type_call (typeobject.c:664)
==4750==    by 0x516442: PyObject_Call (abstract.c:2160)
==4750==    by 0x47FDAE: do_call (ceval.c:4088)

==4750== 414,960 (414,768 direct, 192 indirect) bytes in 2,604 blocks are definitely lost in loss record 3,426 of 3,430
==4750==    at 0x4C2412C: malloc (vg_replace_malloc.c:195)
==4750==    by 0x41B7B5: PyMem_Malloc (object.c:1740)
==4750==    by 0x4C03CF: Balloc (dtoa.c:352)
==4750==    by 0x4C0929: mult (dtoa.c:592)
==4750==    by 0x4C0B90: pow5mult (dtoa.c:691)
==4750==    by 0x4C2B1A: _Py_dg_strtod (dtoa.c:1753)
==4750==    by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
==4750==    by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
==4750==    by 0x543968: PyFloat_FromString (floatobject.c:192)
==4750==    by 0x546E74: float_new (floatobject.c:1569)
==4750==    by 0x42B5C9: type_call (typeobject.c:664)
==4750==    by 0x516442: PyObject_Call (abstract.c:2160)

==4750== 890,720 (532,960 direct, 357,760 indirect) bytes in 3,331 blocks are definitely lost in loss record 3,428 of 3,430
==4750==    at 0x4C2412C: malloc (vg_replace_malloc.c:195)
==4750==    by 0x41B7B5: PyMem_Malloc (object.c:1740)
==4750==    by 0x4C03CF: Balloc (dtoa.c:352)
==4750==    by 0x4C0C82: lshift (dtoa.c:730)
==4750==    by 0x4C2AD1: _Py_dg_strtod (dtoa.c:1744)
==4750==    by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
==4750==    by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
==4750==    by 0x543968: PyFloat_FromString (floatobject.c:192)
==4750==    by 0x546E74: float_new (floatobject.c:1569)
==4750==    by 0x42B5C9: type_call (typeobject.c:664)
==4750==    by 0x516442: PyObject_Call (abstract.c:2160)
==4750==    by 0x47FDAE: do_call (ceval.c:4088)

==4750== 1,021,280 (566,080 direct, 455,200 indirect) bytes in 3,538 blocks are definitely lost in loss record 3,429 of 3,430
==4750==    at 0x4C2412C: malloc (vg_replace_malloc.c:195)
==4750==    by 0x41B7B5: PyMem_Malloc (object.c:1740)
==4750==    by 0x4C03CF: Balloc (dtoa.c:352)
==4750==    by 0x4C0C82: lshift (dtoa.c:730)
==4750==    by 0x4C2BA9: _Py_dg_strtod (dtoa.c:1771)
==4750==    by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
==4750==    by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
==4750==    by 0x543968: PyFloat_FromString (floatobject.c:192)
==4750==    by 0x546E74: float_new (floatobject.c:1569)
==4750==    by 0x42B5C9: type_call (typeobject.c:664)
==4750==    by 0x516442: PyObject_Call (abstract.c:2160)
==4750==    by 0x47FDAE: do_call (ceval.c:4088)

==4750== 1,465,280 (676,640 direct, 788,640 indirect) bytes in 4,229 blocks are definitely lost in loss record 3,430 of 3,430
==4750==    at 0x4C2412C: malloc (vg_replace_malloc.c:195)
==4750==    by 0x41B7B5: PyMem_Malloc (object.c:1740)
==4750==    by 0x4C03CF: Balloc (dtoa.c:352)
==4750==    by 0x4C0F97: diff (dtoa.c:825)
==4750==    by 0x4C2BED: _Py_dg_strtod (dtoa.c:1779)
==4750==    by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
==4750==    by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
==4750==    by 0x543968: PyFloat_FromString (floatobject.c:192)
==4750==    by 0x546E74: float_new (floatobject.c:1569)
==4750==    by 0x42B5C9: type_call (typeobject.c:664)
==4750==    by 0x516442: PyObject_Call (abstract.c:2160)
==4750==    by 0x47FDAE: do_call (ceval.c:4088)
msg97919 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-16 23:28
Stefan, thanks for that!  I'm not entirely sure how to make use of it, though.  Is there a way to tell Valgrind that some leaks are expected?

The main problem with leak detection is that dtoa.c deliberately keeps hold of any malloc'ed chunks less than a certain size (which I think is something like 2048 bytes, but I'm not sure).  These chunks are never freed in normal use;  instead, they're added to a bunch of free lists for the next time that strtod or dtoa is called.  The logic isn't too complicated:  it's in the functions Balloc and Bfree in dtoa.c.

So the right thing to do is just to check that for each call to strtod, the total number of calls to Balloc matches the total number of calls to Bfree with non-NULL argument.  And similarly for dtoa, except that in that case one of the Balloc'd blocks gets returned to the caller (it's the caller's responsibility to call free_dtoa to free it when it's no longer needed), so there should be a difference of 1.

And there's one further wrinkle:  dtoa.c maintains a list of powers of 5 of the form 5**2**k, and this list is automatically extended with newly allocated Bigints when necessary:  those Bigints are never freed either, so calls to Balloc from that source should be ignored.  Another way round this is just to ignore any leak from the first call to strtod, and then do a repeat call with the same parameters;  the second call will already have all the powers of 5 it needs.
msg97920 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-16 23:29
Upgrading to release blocker again:  the memory leak should be fixed for 2.7 (and more immediately, for 3.1.2).
msg97945 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-01-17 13:53
Mark, thanks for the explanation! - You can generate suppressions for the Misc/valgrind-python.supp file, but you have to know exactly which errors can be ignored.

Going through the Valgrind output again, it looks like most of it is about what you already mentioned (bd0, bd, bb, bs and delta not being released).

Would it be much work to provide Valgrind-friendly versions of Balloc, Bfree and pow5mult? Balloc and Bfree are already mentioned in an XXX
comment, pow5mult should be a slow version that doesn't cache anything. Perhaps these could be ifdef'd with Py_USING_MEMORY_DEBUGGER.
msg97946 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-17 13:58
Stefan, I'm not particularly familiar with Valgrind:  can you tell me what would need to be done?  Is a non-caching version of pow5mult all that's required?

Here's the patch that I'm using to detect leaks at the moment.  (It includes a slow pow5mult version.)
msg97952 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-01-17 15:41
With the latest dtoa.c, your non-caching pow5mult and a quick hack for Balloc and Bfree I get zero (dtoa.c related) Valgrind errors.

So the attached memory_debugger.diff is pretty much all what's needed for Valgrind.
msg97973 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-01-17 21:09
Thanks, Stefan.  Applied in r77589 (trunk), r77590 (py3k), r77591 (release31-maint) with one small change:  I moved the freelist and p5s declarations inside the #ifndef Py_USING_MEMORY_DEBUGGER conditionals.

The leak itself was fixed in revisions r77578 through r77580; from Stefan's Valgrind report, and my own refcount testing, it looks as though that was the only leak point.

I haven't finished reviewing/testing the _Py_dg_strtod code yet, but I'm going to close this issue;  if anything new turns up I'll open another one.
History
Date User Action Args
2010-01-17 21:09:44mark.dickinsonsetstatus: open -> closed
resolution: fixed
messages: + msg97973
2010-01-17 15:41:59skrahsetfiles: + memory_debugger.diff

messages: + msg97952
2010-01-17 13:58:23mark.dickinsonsetfiles: + dtoa_detect_leaks.patch

messages: + msg97946
2010-01-17 13:53:29skrahsetmessages: + msg97945
2010-01-16 23:29:02mark.dickinsonsetpriority: normal -> release blocker

messages: + msg97920
2010-01-16 23:28:15mark.dickinsonsetmessages: + msg97919
2010-01-16 22:43:56skrahsetmessages: + msg97915
2010-01-16 22:38:20mark.dickinsonsetmessages: + msg97914
2010-01-16 21:31:16mark.dickinsonsetmessages: + msg97907
2010-01-16 18:15:42mark.dickinsonsetpriority: release blocker -> normal

messages: + msg97889
2010-01-16 18:00:25mark.dickinsonsetmessages: + msg97888
2010-01-16 12:14:13mark.dickinsonsetmessages: + msg97874
2010-01-15 22:42:11eric.smithsetmessages: + msg97857
2010-01-15 22:09:34eric.smithsetmessages: + msg97852
2010-01-15 21:26:41mark.dickinsonsetmessages: + msg97851
2010-01-15 21:21:38mark.dickinsonsetfiles: + issue7632_bug8.patch

messages: + msg97850
title: dtoa.c: oversize b in quorem -> dtoa.c: oversize b in quorem, and a menagerie of other bugs
2010-01-15 15:32:14mark.dickinsonsetpriority: high -> release blocker

messages: + msg97816
2010-01-15 15:28:44tim.peterssetmessages: + msg97815
2010-01-15 15:20:39mark.dickinsonsetmessages: + msg97814
2010-01-14 15:44:21mark.dickinsonsetmessages: + msg97771
2010-01-14 15:23:14mark.dickinsonsetmessages: + msg97770
2010-01-14 14:44:51mark.dickinsonsetmessages: + msg97767
2010-01-14 13:17:45mark.dickinsonsetmessages: + msg97763
2010-01-13 22:37:33mark.dickinsonsetmessages: + msg97741
2010-01-12 23:09:41mark.dickinsonsetmessages: + msg97672
2010-01-12 22:56:52mark.dickinsonsetmessages: + msg97670
2010-01-12 22:25:14mark.dickinsonsetmessages: + msg97667
2010-01-12 18:39:08mark.dickinsonsetmessages: + msg97649
2010-01-10 21:53:42mark.dickinsonsetmessages: + msg97552
2010-01-10 19:43:57tim.peterssetnosy: + tim.peters
messages: + msg97544
2010-01-10 17:34:56mark.dickinsonsetmessages: + msg97524
2010-01-09 16:54:16mark.dickinsonsetfiles: + test_dtoa.py

messages: + msg97459
2010-01-09 16:47:35mark.dickinsonsetfiles: + issue7632_v2.patch

messages: + msg97458
2010-01-08 22:18:31mark.dickinsonsetmessages: + msg97443
2010-01-08 20:44:11mark.dickinsonsetmessages: + msg97439
stage: patch review -> needs patch
2010-01-08 17:10:27mark.dickinsonsetfiles: + issue7632.patch
keywords: + patch
messages: + msg97417

stage: needs patch -> patch review
2010-01-04 22:14:49mark.dickinsonsetmessages: + msg97226
2010-01-04 13:37:58eric.smithsetmessages: + msg97210
versions: + Python 3.1
2010-01-04 13:23:17mark.dickinsonsetpriority: critical -> high

messages: + msg97209
2010-01-04 12:57:50mark.dickinsonsetnosy: mark.dickinson, eric.smith, skrah
type: crash
components: + Interpreter Core
stage: needs patch
2010-01-04 12:57:15mark.dickinsonsetpriority: critical
assignee: mark.dickinson
messages: + msg97206

versions: + Python 2.7
2010-01-04 12:50:03skrahcreate