Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dtoa.c: oversize b in quorem, and a menagerie of other bugs #51881

Closed
skrah mannequin opened this issue Jan 4, 2010 · 41 comments
Closed

dtoa.c: oversize b in quorem, and a menagerie of other bugs #51881

skrah mannequin opened this issue Jan 4, 2010 · 41 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@skrah
Copy link
Mannequin

skrah mannequin commented Jan 4, 2010

BPO 7632
Nosy @tim-one, @mdickinson, @ericvsmith, @skrah
Files
  • issue7632.patch
  • issue7632_v2.patch
  • test_dtoa.py: Random tests for string -> float conversion
  • issue7632_bug8.patch: Patch for the release blocker
  • dtoa_detect_leaks.patch: Detect leaks from dtoa and strtod.
  • memory_debugger.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/mdickinson'
    closed_at = <Date 2010-01-17.21:09:44.862>
    created_at = <Date 2010-01-04.12:50:03.934>
    labels = ['interpreter-core', 'type-crash', 'release-blocker']
    title = 'dtoa.c: oversize b in quorem, and a menagerie of other bugs'
    updated_at = <Date 2010-01-17.21:09:44.860>
    user = 'https://github.com/skrah'

    bugs.python.org fields:

    activity = <Date 2010-01-17.21:09:44.860>
    actor = 'mark.dickinson'
    assignee = 'mark.dickinson'
    closed = True
    closed_date = <Date 2010-01-17.21:09:44.862>
    closer = 'mark.dickinson'
    components = ['Interpreter Core']
    creation = <Date 2010-01-04.12:50:03.934>
    creator = 'skrah'
    dependencies = []
    files = ['15792', '15796', '15797', '15899', '15926', '15930']
    hgrepos = []
    issue_num = 7632
    keywords = ['patch']
    message_count = 41.0
    messages = ['97205', '97206', '97209', '97210', '97226', '97417', '97439', '97443', '97458', '97459', '97524', '97544', '97552', '97649', '97667', '97670', '97672', '97741', '97763', '97767', '97770', '97771', '97814', '97815', '97816', '97850', '97851', '97852', '97857', '97874', '97888', '97889', '97907', '97914', '97915', '97919', '97920', '97945', '97946', '97952', '97973']
    nosy_count = 4.0
    nosy_names = ['tim.peters', 'mark.dickinson', 'eric.smith', 'skrah']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'needs patch'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue7632'
    versions = ['Python 3.1', 'Python 2.7', 'Python 3.2']

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jan 4, 2010

    In a debug build:

    Python 3.2a0 (py3k:76671M, Dec 22 2009, 19:41:08) 
    [GCC 4.1.3 20080623 (prerelease) (Ubuntu 4.1.2-23ubuntu3)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> s = "2183167012312112312312.23538020374420446192e-370"
    [30473 refs]
    >>> f = float(s)
    oversize b in quorem

    @mdickinson
    Copy link
    Member

    Nice catch! I'll take a look. We should find out whether this is something that happens with Gay's original code, or whether it was introduced in the process of adapting that code for Python.

    @mdickinson mdickinson self-assigned this Jan 4, 2010
    @mdickinson mdickinson added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Jan 4, 2010
    @mdickinson
    Copy link
    Member

    I can reproduce this on OS X 10.6 (64-bit), both in py3k and trunk debug builds. In non-debug builds it appears to return the correct result (0.0), so the oversize b appears to have no ill effects. So this may just be an overeager assert; it may be a symptom of a deeper problem, though.

    @ericvsmith
    Copy link
    Member

    I'm testing on a Fedora Core 6 i386 box and an Intel Mac 32-bit 10.5 box. I only see this on debug builds. I've tested trunk, py3k, release31-maint, and release26-maint (just for giggles).

    The error shows up in debug builds of trunk, py3k, and release31-maint on both machines, and does not show up in non-debug builds.

    @mdickinson
    Copy link
    Member

    The bug is present in the current version of dtoa.c from http://www.netlib.org/fp, so I'll report it upstream. As far as I can tell, though, it's benign, in the sense that if the check is disabled then nothing bad happens, and the correct result is eventually returned (albeit after some unnecessary computation).

    I suspect that the problem is in the if block around lines 1531--1543 of Python/dtoa.c: a subnormal rv isn't being handled correctly here---it should end up being set to 0.0, but is instead set to 2**-968.

    @mdickinson
    Copy link
    Member

    Here's a patch that seems to fix the problem; I'll wait a while to see if I get a response from David Gay before applying this.

    Also, if we've got to the stage of modifying the algorithmic part of the original dtoa.c, we should really make sure that we've got our own set of comprehensive tests.

    @mdickinson
    Copy link
    Member

    Randomised testing quickly turned up another troublesome string for str -> float conversion:

    s = "94393431193180696942841837085033647913224148539854e-358"

    This one's actually giving incorrectly rounded results (the horror!) in a non-debug build of trunk, and giving the same 'oversize b in quorem' in a debug build. With the patch, it doesn't give the 'oversize b' error, but does still give incorrect results.

    Python 2.7a1+ (trunk:77375, Jan  8 2010, 20:33:59) 
    [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> s = "94393431193180696942841837085033647913224148539854e-358"
    >>> float(s)   # result of dtoa.c
    9.439343119318067e-309
    >>> from __future__ import division
    >>> int(s[:-5])/10**358  # result via (correctly rounded) division
    9.43934311931807e-309

    I also double checked this value using a simple pure Python implementation of strtod, and using MPFR (via the Python bigfloat module), with the same result:

    >>> from test_dtoa import strtod
    >>> strtod(s)  # result via a simple pure Python implementation of strtod
    9.43934311931807e-309
    >>> from bigfloat import *
    >>> with double_precision: x = float(BigFloat(s))
    >>> x   # result from MPFR, via the bigfloat module
    9.43934311931807e-309

    @mdickinson
    Copy link
    Member

    Okay, I think I've found the cause of the second rounding bug above: at the end of the bigcomp function there's a correction block that looks like

        ...
        else if (dd < 0) {
            if (!dsign)     /* does not happen for round-near */
              retlow1:
                dval(rv) -= ulp(rv);
        }
        else if (dd > 0) {
            if (dsign) {
              rethi1:
                dval(rv) += ulp(rv);
            }
        }
        else ...

    The problem is that the += and -= corrections don't take into account the possibility that bc->scale is nonzero, and for the case where the scaled rv is subnormal, they'll typically have no effect.

    I'll work on a fix... tomorrow.

    @mdickinson
    Copy link
    Member

    Second patch, adding a fix for the rounding bug to the first patch.

    @mdickinson
    Copy link
    Member

    Here's the (rather crude) testing program that turned up these errors.

    @mdickinson
    Copy link
    Member

    One more incorrectly rounded result, this time for a normal number:

    AssertionError: Incorrectly rounded str->float conversion for 99999999999999994487665465554760717039532578546e-47: expected 0x1.0000000000000p+0, got 0x1.fffffffffffffp-1

    @tim-one
    Copy link
    Member

    tim-one commented Jan 10, 2010

    Showing once again that a proof of FP code correctness is about as compelling as a proof of God's ontological status ;-)

    Still, have to express surprised admiration for 99999999999999994487665465554760717039532578546e-47! That one's not even close to being a "hard" case.

    @mdickinson
    Copy link
    Member

    Showing once again that a proof of FP code correctness is about as
    compelling as a proof of God's ontological status ;-)

    Clearly we need a 1000-page Isabelle/HOL-style machine-checked formal proof, rather than a ten-page TeX proof. Any takers?

    All of the above bugs seem to have been introduced with the new 'bigcomp' code that arrived on March 16, 2009, just a couple of weeks before I downloaded the version that got adapted for Python; in retrospect, I probably should have used the NO_STRTOD_BIGCOMP #define to bypass the new code.

    @mdickinson
    Copy link
    Member

    Progress report: I've had a response, and fix, from David Gay for the first 2 bugs (Stefan's original bug and the incorrect subnormal result); I'm still arguing with him about a 3rd one (not reported here; there's some possibly incorrect code in bigcomp that probably never actually gets called). I reported the 4th bug (the incorrect rounding for values near 1) to him today. In the mean time, here's bug number 5, found by eyeballing the bigcomp code until it surrendered. :-)

    >>> 1000000000000000000000000000000000000000e-16
    1e+23
    >>> 10000000000000000000000000000000000000000e-17
    1.0000000000000001e+23

    @mdickinson
    Copy link
    Member

    Fixed the crash that Stefan originally reported in r77450. That revision also removes the 'possibly incorrect code in bigcomp that probably never actually gets called'.

    @mdickinson
    Copy link
    Member

    Second bug fixed in r77451 (trunk), using a fix from David Gay, modified slightly for correctness.

    @mdickinson
    Copy link
    Member

    Merged fixes so far, and a couple of other cleanups, to py3k in r77452, and release31-maint in r77453.

    @mdickinson
    Copy link
    Member

    Just so I don't forget, there are a couple more places in the dtoa.c that look suspicious and need to be checked; I haven't tried to generate failures for them yet. Since we're up to bug 5, I'll number these 6 and 7:

    (6) at the end of bigcomp, when applying the round-to-even rule for halfway cases, the lsb of rv is checked. This looks wrong if bc->scale is nonzero.

    (7) In the main strtod correction loop, after computing delta and i, there's a block:

                    bc.nd = nd;
                    i = -1; /* Discarded digits make delta smaller. */

    This logic seems invalid if all the discarded digits are zero. (This is the same logic error as is causing bug5: the bigcomp comparison code also assumes incorrectly that digit nd-1 of s0 is nonzero.)

    @mdickinson
    Copy link
    Member

    Bug 6 is indeed a bug: an example incorrectly-rounded string is:
    '104308485241983990666713401708072175773165034278685682646111762292409330928739751702404658197872319129036519947435319418387839758990478549477777586673075945844895981012024387992135617064532141489278815239849108105951619997829153633535314849999674266169258928940692239684771590065027025835804863585454872499320500023126142553932654370362024104462255244034053203998964360882487378334860197725139151265590832887433736189468858614521708567646743455601905935595381852723723645799866672558576993978025033590728687206296379801363024094048327273913079612469982585674824156000783167963081616214710691759864332339239688734656548790656486646106983450809073750535624894296242072010195710276073042036425579852459556183541199012652571123898996574563824424330960027873516082763671875e-1075'

    It's fixed in r77491. I'll add tests once the remaining (known) dtoa.c bugs are fixed.

    @mdickinson
    Copy link
    Member

    Bug 4 fixed in r77492. This just leaves bugs 5 and 7; I have a fix for these in the works.

    @mdickinson
    Copy link
    Member

    Tests committed in r77493.

    @mdickinson
    Copy link
    Member

    Fixes and tests so far merged to py3k in r77494, release31-maint in r77496.

    @mdickinson
    Copy link
    Member

    I was considering downgrading this to 'normal'. Then I found Bug 8, and it's a biggie:

    >>> 10.900000000000000012345678912345678912345
    10.0

    Now I'm thinking it should be upgraded to release blocker instead.

    The cause is in the _Py_strtod block that starts: 'if (nd > STRTOD_DIGLIM) {'... It truncates the input to 18 digits, and then deletes trailing zeros. But the code that deletes the zeros is buggy, and passes over the digit '9' just before the point.

    @tim-one
    Copy link
    Member

    tim-one commented Jan 15, 2010

    Mark, I agree that last one should be a release blocker -- it's truly dreadful.

    BTW, did you guess in advance just how many bugs there could be in this kind of code? I did ;-)

    @mdickinson
    Copy link
    Member

    Upgrading to release blocker. It'll almost certainly be fixed before the weekend is out. (And I will, of course, report it upstream.)

    @mdickinson
    Copy link
    Member

    Here's a patch for the release blocker.

    Eric, would you be interested in double checking the logic for this patch?

    Tim: No, I have to admit I didn't forsee quite this number of bugs. :)

    @mdickinson mdickinson changed the title dtoa.c: oversize b in quorem dtoa.c: oversize b in quorem, and a menagerie of other bugs Jan 15, 2010
    @mdickinson
    Copy link
    Member

    issue7632_bug8.patch uploaded to Rietveld:

    http://codereview.appspot.com/186168

    @ericvsmith
    Copy link
    Member

    It looks correct to me, assuming this comment is correct:

     /* scan back until we hit a nonzero digit.  significant digit 'i'
        is s0[i] if i < nd0, s0[i+1] if i >= nd0. */
    

    I didn't verify the comment itself.

    @ericvsmith
    Copy link
    Member

    I have a few minor comments posted on Rietveld, but nothing that would keep you from checking this in.

    @mdickinson
    Copy link
    Member

    Applied the bug 8 patch in r77519 (thanks Eric for reviewing!). For safety, I'll leave this as a release blocker until fixes have been merged to py3k and release31-maint.

    I've uploaded a fix for bugs 5 and 7 to Rietveld:

    http://codereview.appspot.com/186182

    I still don't like the parsing code much: I'm tempted to pull out the calculation of y and z and do it after the parsing is complete. It's probably marginally less efficient that way, but it would help make the code clearer.

    @mdickinson
    Copy link
    Member

    I've applied a minimal fix for bugs 5 and 7 in r77530 (trunk). (I wasn't able to produce any strings that trigger bug 7, so it may not technically be a bug.)

    I'm continuing to review, comment, and clean up the remainder of the _Py_dg_strtod.

    @mdickinson
    Copy link
    Member

    Fixes merged to py3k and release31-maint in r77535 and r77537.

    @mdickinson
    Copy link
    Member

    One of the buildbots just produced a MemoryError from test_strtod:

    http://www.python.org/dev/buildbot/all/builders/i386%20Ubuntu%203.x/builds/411

    It looks as though there's a memory leak somewhere in dtoa.c. It's a bit difficult to tell, though, since the memory allocation functions in that file deliberately hold on to small pieces of memory.

    @mdickinson
    Copy link
    Member

    Okay, so there's a memory leak for overflowing values: if an overflow is detected in the main correction loop of _Py_dg_strtod, then 'references' to bd0, bd, bb, bs and delta aren't released.

    There may be other leaks; I'm trying to come up with a good way to detect them reliably.

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jan 16, 2010

    This is what Valgrind complains about:

    ==4750== 3,456 (1,440 direct, 2,016 indirect) bytes in 30 blocks are definitely lost in loss record 3,302 of 3,430
    ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195)
    ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740)
    ==4750== by 0x4C03CF: Balloc (dtoa.c:352)
    ==4750== by 0x4C286E: _Py_dg_strtod (dtoa.c:1675)
    ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
    ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
    ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192)
    ==4750== by 0x546E74: float_new (floatobject.c:1569)
    ==4750== by 0x42B5C9: type_call (typeobject.c:664)
    ==4750== by 0x516442: PyObject_Call (abstract.c:2160)
    ==4750== by 0x47FDAE: do_call (ceval.c:4088)
    ==4750== by 0x47F1CF: call_function (ceval.c:3891)

    ==4750== 9,680 bytes in 242 blocks are still reachable in loss record 3,369 of 3,430
    ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195)
    ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740)
    ==4750== by 0x4C03CF: Balloc (dtoa.c:352)
    ==4750== by 0x4C0875: i2b (dtoa.c:556)
    ==4750== by 0x4C2906: _Py_dg_strtod (dtoa.c:1687)
    ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
    ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
    ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192)
    ==4750== by 0x546E74: float_new (floatobject.c:1569)
    ==4750== by 0x42B5C9: type_call (typeobject.c:664)
    ==4750== by 0x516442: PyObject_Call (abstract.c:2160)
    ==4750== by 0x47FDAE: do_call (ceval.c:4088)

    ==4750== 270,720 bytes in 1,692 blocks are indirectly lost in loss record 3,423 of 3,430
    ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195)
    ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740)
    ==4750== by 0x4C03CF: Balloc (dtoa.c:352)
    ==4750== by 0x4C0F97: diff (dtoa.c:825)
    ==4750== by 0x4C2BED: _Py_dg_strtod (dtoa.c:1779)
    ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
    ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
    ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192)
    ==4750== by 0x546E74: float_new (floatobject.c:1569)
    ==4750== by 0x42B5C9: type_call (typeobject.c:664)
    ==4750== by 0x516442: PyObject_Call (abstract.c:2160)
    ==4750== by 0x47FDAE: do_call (ceval.c:4088)

    ==4750== 382,080 bytes in 2,388 blocks are indirectly lost in loss record 3,424 of 3,430
    ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195)
    ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740)
    ==4750== by 0x4C03CF: Balloc (dtoa.c:352)
    ==4750== by 0x4C0C82: lshift (dtoa.c:730)
    ==4750== by 0x4C2BA9: _Py_dg_strtod (dtoa.c:1771)
    ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
    ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
    ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192)
    ==4750== by 0x546E74: float_new (floatobject.c:1569)
    ==4750== by 0x42B5C9: type_call (typeobject.c:664)
    ==4750== by 0x516442: PyObject_Call (abstract.c:2160)
    ==4750== by 0x47FDAE: do_call (ceval.c:4088)

    ==4750== 414,560 bytes in 2,591 blocks are indirectly lost in loss record 3,425 of 3,430
    ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195)
    ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740)
    ==4750== by 0x4C03CF: Balloc (dtoa.c:352)
    ==4750== by 0x4C0C82: lshift (dtoa.c:730)
    ==4750== by 0x4C2AD1: _Py_dg_strtod (dtoa.c:1744)
    ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
    ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
    ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192)
    ==4750== by 0x546E74: float_new (floatobject.c:1569)
    ==4750== by 0x42B5C9: type_call (typeobject.c:664)
    ==4750== by 0x516442: PyObject_Call (abstract.c:2160)
    ==4750== by 0x47FDAE: do_call (ceval.c:4088)

    ==4750== 414,960 (414,768 direct, 192 indirect) bytes in 2,604 blocks are definitely lost in loss record 3,426 of 3,430
    ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195)
    ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740)
    ==4750== by 0x4C03CF: Balloc (dtoa.c:352)
    ==4750== by 0x4C0929: mult (dtoa.c:592)
    ==4750== by 0x4C0B90: pow5mult (dtoa.c:691)
    ==4750== by 0x4C2B1A: _Py_dg_strtod (dtoa.c:1753)
    ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
    ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
    ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192)
    ==4750== by 0x546E74: float_new (floatobject.c:1569)
    ==4750== by 0x42B5C9: type_call (typeobject.c:664)
    ==4750== by 0x516442: PyObject_Call (abstract.c:2160)

    ==4750== 890,720 (532,960 direct, 357,760 indirect) bytes in 3,331 blocks are definitely lost in loss record 3,428 of 3,430
    ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195)
    ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740)
    ==4750== by 0x4C03CF: Balloc (dtoa.c:352)
    ==4750== by 0x4C0C82: lshift (dtoa.c:730)
    ==4750== by 0x4C2AD1: _Py_dg_strtod (dtoa.c:1744)
    ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
    ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
    ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192)
    ==4750== by 0x546E74: float_new (floatobject.c:1569)
    ==4750== by 0x42B5C9: type_call (typeobject.c:664)
    ==4750== by 0x516442: PyObject_Call (abstract.c:2160)
    ==4750== by 0x47FDAE: do_call (ceval.c:4088)

    ==4750== 1,021,280 (566,080 direct, 455,200 indirect) bytes in 3,538 blocks are definitely lost in loss record 3,429 of 3,430
    ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195)
    ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740)
    ==4750== by 0x4C03CF: Balloc (dtoa.c:352)
    ==4750== by 0x4C0C82: lshift (dtoa.c:730)
    ==4750== by 0x4C2BA9: _Py_dg_strtod (dtoa.c:1771)
    ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
    ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
    ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192)
    ==4750== by 0x546E74: float_new (floatobject.c:1569)
    ==4750== by 0x42B5C9: type_call (typeobject.c:664)
    ==4750== by 0x516442: PyObject_Call (abstract.c:2160)
    ==4750== by 0x47FDAE: do_call (ceval.c:4088)

    ==4750== 1,465,280 (676,640 direct, 788,640 indirect) bytes in 4,229 blocks are definitely lost in loss record 3,430 of 3,430
    ==4750== at 0x4C2412C: malloc (vg_replace_malloc.c:195)
    ==4750== by 0x41B7B5: PyMem_Malloc (object.c:1740)
    ==4750== by 0x4C03CF: Balloc (dtoa.c:352)
    ==4750== by 0x4C0F97: diff (dtoa.c:825)
    ==4750== by 0x4C2BED: _Py_dg_strtod (dtoa.c:1779)
    ==4750== by 0x4BEDF2: _PyOS_ascii_strtod (pystrtod.c:103)
    ==4750== by 0x4BEF61: PyOS_string_to_double (pystrtod.c:345)
    ==4750== by 0x543968: PyFloat_FromString (floatobject.c:192)
    ==4750== by 0x546E74: float_new (floatobject.c:1569)
    ==4750== by 0x42B5C9: type_call (typeobject.c:664)
    ==4750== by 0x516442: PyObject_Call (abstract.c:2160)
    ==4750== by 0x47FDAE: do_call (ceval.c:4088)

    @mdickinson
    Copy link
    Member

    Stefan, thanks for that! I'm not entirely sure how to make use of it, though. Is there a way to tell Valgrind that some leaks are expected?

    The main problem with leak detection is that dtoa.c deliberately keeps hold of any malloc'ed chunks less than a certain size (which I think is something like 2048 bytes, but I'm not sure). These chunks are never freed in normal use; instead, they're added to a bunch of free lists for the next time that strtod or dtoa is called. The logic isn't too complicated: it's in the functions Balloc and Bfree in dtoa.c.

    So the right thing to do is just to check that for each call to strtod, the total number of calls to Balloc matches the total number of calls to Bfree with non-NULL argument. And similarly for dtoa, except that in that case one of the Balloc'd blocks gets returned to the caller (it's the caller's responsibility to call free_dtoa to free it when it's no longer needed), so there should be a difference of 1.

    And there's one further wrinkle: dtoa.c maintains a list of powers of 5 of the form 5**2**k, and this list is automatically extended with newly allocated Bigints when necessary: those Bigints are never freed either, so calls to Balloc from that source should be ignored. Another way round this is just to ignore any leak from the first call to strtod, and then do a repeat call with the same parameters; the second call will already have all the powers of 5 it needs.

    @mdickinson
    Copy link
    Member

    Upgrading to release blocker again: the memory leak should be fixed for 2.7 (and more immediately, for 3.1.2).

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jan 17, 2010

    Mark, thanks for the explanation! - You can generate suppressions for the Misc/valgrind-python.supp file, but you have to know exactly which errors can be ignored.

    Going through the Valgrind output again, it looks like most of it is about what you already mentioned (bd0, bd, bb, bs and delta not being released).

    Would it be much work to provide Valgrind-friendly versions of Balloc, Bfree and pow5mult? Balloc and Bfree are already mentioned in an XXX
    comment, pow5mult should be a slow version that doesn't cache anything. Perhaps these could be ifdef'd with Py_USING_MEMORY_DEBUGGER.

    @mdickinson
    Copy link
    Member

    Stefan, I'm not particularly familiar with Valgrind: can you tell me what would need to be done? Is a non-caching version of pow5mult all that's required?

    Here's the patch that I'm using to detect leaks at the moment. (It includes a slow pow5mult version.)

    @skrah
    Copy link
    Mannequin Author

    skrah mannequin commented Jan 17, 2010

    With the latest dtoa.c, your non-caching pow5mult and a quick hack for Balloc and Bfree I get zero (dtoa.c related) Valgrind errors.

    So the attached memory_debugger.diff is pretty much all what's needed for Valgrind.

    @mdickinson
    Copy link
    Member

    Thanks, Stefan. Applied in r77589 (trunk), r77590 (py3k), r77591 (release31-maint) with one small change: I moved the freelist and p5s declarations inside the #ifndef Py_USING_MEMORY_DEBUGGER conditionals.

    The leak itself was fixed in revisions r77578 through r77580; from Stefan's Valgrind report, and my own refcount testing, it looks as though that was the only leak point.

    I haven't finished reviewing/testing the _Py_dg_strtod code yet, but I'm going to close this issue; if anything new turns up I'll open another one.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants