This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: _io.FileIO uses a quadratic-time buffer growth algorithm
Type: performance Stage: resolved
Components: IO Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: nadeem.vawda Nosy List: benjamin.peterson, nadeem.vawda, pitrou, python-dev, stutzbach, vstinner
Priority: normal Keywords: patch

Created on 2011-10-12 16:02 by nadeem.vawda, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
buffer-growth.diff nadeem.vawda, 2011-10-12 17:19 review
Messages (7)
msg145403 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-10-12 16:01
As mentioned in issue 6715, the buffer growth strategy used by _io.FileIO
has quadratic running time if each reallocation is O(n). The code in
question is new_buffersize() from Modules/_io/fileio.c:

    if (currentsize > SMALLCHUNK) {
        /* Keep doubling until we reach BIGCHUNK;
           then keep adding BIGCHUNK. */
        if (currentsize <= BIGCHUNK)
            return currentsize + currentsize;
        else
            return currentsize + BIGCHUNK;
    }
    return currentsize + SMALLCHUNK;

Is there a reason for this? One possible improvement would be to instead
use the same formula as list.resize() in Objects/listobject.c:

    new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);

which has amortized O(n) running time behaviour.

Your thoughts?
msg145418 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-10-12 17:19
I've attached a patch that makes the suggested change to FileIO, and also
to _bz2.BZ2Compressor/Decompressor (which currently have the same issue).
msg145443 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-12 23:38
The patch looks good to me, thanks.
msg145448 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-13 11:45
New changeset d18c80a8c119 by Nadeem Vawda in branch '3.2':
Issue #13159: Replace FileIO's quadratic-time buffer growth algorithm with a linear-time one.
http://hg.python.org/cpython/rev/d18c80a8c119

New changeset 4a6709a071d0 by Nadeem Vawda in branch 'default':
Merge #13159: Replace FileIO's quadratic-time buffer growth algorithm with a linear-time one.
http://hg.python.org/cpython/rev/4a6709a071d0
msg145449 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-13 11:58
New changeset c1c434e30e06 by Nadeem Vawda in branch '2.7':
Issue #13159: Replace FileIO's quadratic-time buffer growth algorithm with a linear-time one.
http://hg.python.org/cpython/rev/c1c434e30e06
msg145450 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-13 12:04
Thank you :)
msg145453 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-10-13 14:01
No problem :)
History
Date User Action Args
2022-04-11 14:57:22adminsetgithub: 57368
2011-10-13 14:01:59nadeem.vawdasetstatus: open -> closed
messages: + msg145453

assignee: nadeem.vawda
resolution: fixed
stage: patch review -> resolved
2011-10-13 12:04:31pitrousetmessages: + msg145450
2011-10-13 11:58:31python-devsetmessages: + msg145449
2011-10-13 11:45:44python-devsetnosy: + python-dev
messages: + msg145448
2011-10-12 23:38:22pitrousetmessages: + msg145443
versions: + Python 2.7, Python 3.2
2011-10-12 17:24:31vstinnersetnosy: + vstinner
2011-10-12 17:20:01nadeem.vawdasetstage: needs patch -> patch review
2011-10-12 17:19:15nadeem.vawdasetfiles: + buffer-growth.diff
keywords: + patch
messages: + msg145418
2011-10-12 16:02:00nadeem.vawdacreate