Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileIO.readall() has worst case O(n^2) complexity #59962

Closed
sbt mannequin opened this issue Aug 21, 2012 · 25 comments
Closed

FileIO.readall() has worst case O(n^2) complexity #59962

sbt mannequin opened this issue Aug 21, 2012 · 25 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@sbt
Copy link
Mannequin

sbt mannequin commented Aug 21, 2012

BPO 15758
Nosy @pitrou, @vstinner
Files
  • readall-benchmark.py
  • readall.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-05-18.14:19:37.296>
    created_at = <Date 2012-08-21.22:25:09.945>
    labels = ['library', 'performance']
    title = 'FileIO.readall() has worst case O(n^2) complexity'
    updated_at = <Date 2013-05-18.14:19:37.295>
    user = 'https://bugs.python.org/sbt'

    bugs.python.org fields:

    activity = <Date 2013-05-18.14:19:37.295>
    actor = 'sbt'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-05-18.14:19:37.296>
    closer = 'sbt'
    components = ['Library (Lib)']
    creation = <Date 2012-08-21.22:25:09.945>
    creator = 'sbt'
    dependencies = []
    files = ['26986', '30295']
    hgrepos = []
    issue_num = 15758
    keywords = ['patch']
    message_count = 25.0
    messages = ['168813', '168819', '168825', '168826', '168829', '168862', '168863', '168865', '168867', '168876', '168882', '168884', '168944', '168945', '169020', '169021', '169022', '169044', '169047', '189384', '189408', '189450', '189459', '189487', '189489']
    nosy_count = 5.0
    nosy_names = ['pitrou', 'vstinner', 'schmir', 'python-dev', 'sbt']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue15758'
    versions = ['Python 3.4']

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Aug 21, 2012

    Piping significant amounts of data through a subprocess using Popen.communicate() is crazily slow on Windows.

    The attached program just pushes data through mingw's cat.exe.

    Python 3.3:
    amount = 1 MB; time taken = 0.07 secs; rate = 13.51 MB/s
    amount = 2 MB; time taken = 0.31 secs; rate = 6.51 MB/s
    amount = 4 MB; time taken = 1.30 secs; rate = 3.08 MB/s
    amount = 8 MB; time taken = 5.43 secs; rate = 1.47 MB/s
    amount = 16 MB; time taken = 21.64 secs; rate = 0.74 MB/s
    amount = 32 MB; time taken = 87.36 secs; rate = 0.37 MB/s

    Python 2.7:
    amount = 1 MB; time taken = 0.02 secs; rate = 66.67 MB/s
    amount = 2 MB; time taken = 0.03 secs; rate = 68.97 MB/s
    amount = 4 MB; time taken = 0.05 secs; rate = 76.92 MB/s
    amount = 8 MB; time taken = 0.10 secs; rate = 82.47 MB/s
    amount = 16 MB; time taken = 0.27 secs; rate = 60.38 MB/s
    amount = 32 MB; time taken = 0.88 secs; rate = 36.36 MB/s
    amount = 64 MB; time taken = 3.20 secs; rate = 20.03 MB/s
    amount = 128 MB; time taken = 12.36 secs; rate = 10.35 MB/s

    For Python 3.3 this looks like O(n^2) complexity to me. 2.7 is better but still struggles for large amounts.

    Changing Popen._readerthread() to read in chunks rather than using FileIO.readall() produces a huge speed up:

    Python 3.3 with patch:
    amount = 1 MB; time taken = 0.01 secs; rate = 76.92 MB/s
    amount = 2 MB; time taken = 0.03 secs; rate = 76.92 MB/s
    amount = 4 MB; time taken = 0.04 secs; rate = 111.10 MB/s
    amount = 8 MB; time taken = 0.05 secs; rate = 148.14 MB/s
    amount = 16 MB; time taken = 0.10 secs; rate = 156.85 MB/s
    amount = 32 MB; time taken = 0.16 secs; rate = 198.75 MB/s
    amount = 64 MB; time taken = 0.31 secs; rate = 205.78 MB/s
    amount = 128 MB; time taken = 0.61 secs; rate = 209.82 MB/s

    Maybe FileIO.readall() should do something similar for files whose size cannot be determined by stat().

    @sbt sbt mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Aug 21, 2012
    @pitrou
    Copy link
    Member

    pitrou commented Aug 21, 2012

    Yes, I think FileIO.readall() should be fixed rather than avoided.

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Aug 21, 2012

    RawIOBase.readall() does the sensible thing already. Maybe FileIO should be allowed to inherit it.

    The alternative patch (which probably only works for raw unbuffered case)

    diff -r ca54c27a9045 Lib/subprocess.py
    --- a/Lib/subprocess.py Tue Aug 21 14:54:22 2012 +0100
    +++ b/Lib/subprocess.py Wed Aug 22 00:39:32 2012 +0100
    @@ -1152,7 +1152,7 @@

             def _readerthread(self, fh, buffer):
    -            buffer.append(fh.read())
    +            buffer.append(io.RawIOBase.readall(fh))
                 fh.close()

    produces

    amount = 1 MB; time taken = 0.01 secs; rate = 71.42 MB/s
    amount = 2 MB; time taken = 0.02 secs; rate = 83.33 MB/s
    amount = 4 MB; time taken = 0.04 secs; rate = 105.26 MB/s
    amount = 8 MB; time taken = 0.06 secs; rate = 135.59 MB/s
    amount = 16 MB; time taken = 0.09 secs; rate = 170.20 MB/s
    amount = 32 MB; time taken = 0.17 secs; rate = 190.46 MB/s
    amount = 64 MB; time taken = 0.32 secs; rate = 202.52 MB/s
    amount = 128 MB; time taken = 0.61 secs; rate = 211.56 MB/s

    @pitrou
    Copy link
    Member

    pitrou commented Aug 21, 2012

    FileIO.readall() already has an overallocation mechanism which should yield linear complexity. Perhaps it needs to be tweaked a bit?
    (look at new_buffersize in Modules/_io/fileio.c)

    By the way, your results are bit weird. Why does the data rate increase with the transfered amount?

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Aug 22, 2012

    FileIO.readall() already has an overallocation mechanism which should
    yield linear complexity. Perhaps it needs to be tweaked a bit?
    (look at new_buffersize in Modules/_io/fileio.c)

    I think it needs a bit more than a tweak;-) Looks like it increases by 12.5% at time. Does not seem to work as intended though.

    By the way, your results are bit weird. Why does the data rate increase
    with the transfered amount?

    The rates are completely unreliable for those benchmarks which take less than 0.1 seconds. They level off after 0.1 secs when the overheads become insignificant.

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Aug 22, 2012

    In each loop before calling read() the buffer size is recalculated based on the amount of space used, causing a realloc *regardless* of how much empty space is left in the buffer. And each read is only producing a smallish chunk (5120 bytes).

    So assuming realloc fails to do an inplace realloc (and Windows is reputedly very bad at that) the amount of data copied will be O(n^2).

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Aug 22, 2012

    The attached patch (readall-resize.patch) makes the resizes only happen when the buffer is full. It produces

    amount = 1 MB; time taken = 0.02 secs; rate = 64.10 MB/s
    amount = 2 MB; time taken = 0.03 secs; rate = 64.10 MB/s
    amount = 4 MB; time taken = 0.06 secs; rate = 64.10 MB/s
    amount = 8 MB; time taken = 0.08 secs; rate = 102.56 MB/s
    amount = 16 MB; time taken = 0.17 secs; rate = 93.24 MB/s
    amount = 32 MB; time taken = 0.31 secs; rate = 102.56 MB/s
    amount = 64 MB; time taken = 0.70 secs; rate = 91.17 MB/s
    amount = 128 MB; time taken = 1.81 secs; rate = 70.73 MB/s

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Aug 22, 2012

    Alternative patch (readall-chunks.patch) which delegates to RawIOBase.readall() if the file cannot be stat-ed or appears to have size zero.

    amount = 1 MB; time taken = 0.02 secs; rate = 64.10 MB/s
    amount = 2 MB; time taken = 0.02 secs; rate = 128.21 MB/s
    amount = 4 MB; time taken = 0.03 secs; rate = 128.20 MB/s
    amount = 8 MB; time taken = 0.05 secs; rate = 170.94 MB/s
    amount = 16 MB; time taken = 0.09 secs; rate = 170.94 MB/s
    amount = 32 MB; time taken = 0.16 secs; rate = 205.13 MB/s
    amount = 64 MB; time taken = 0.30 secs; rate = 215.92 MB/s
    amount = 128 MB; time taken = 0.58 secs; rate = 221.76 MB/s

    @pitrou
    Copy link
    Member

    pitrou commented Aug 22, 2012

    Alternative patch (readall-chunks.patch) which delegates to
    RawIOBase.readall() if the file cannot be stat-ed or appears to have
    size zero.

    It seems we could do both (fix the resizing logic, and fallback on
    chunks if the size is unknown).

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Aug 22, 2012

    It seems we could do both (fix the resizing logic, and fallback on
    chunks if the size is unknown).

    Combined patch attached.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 22, 2012

    If new_buffersize() is only used with HAVE_FSTAT, then its code can probably be simplified.

    @vstinner
    Copy link
    Member

    I'm lost, there are too much patches. Could you please remove old/useless patches? Patching the subprocess module is not a good solution: if there is an issue in the io module, the io module should be fixed, and so this patch can be removed.

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Aug 23, 2012

    Here is the patch (with the old ones removed).

    Note that the old code mishandled the case where _PyBytes_Resize() failed by assuming that the old bytes object would still be valid.

    I have assumed that stream psuedo-files will never claim to have a size greater than zero. The code will still work if this assumption is false.

    @sbt sbt mannequin changed the title 500x speed up for Popen.communicate() on Windows FileIO.readall() has worst case O(n^2) complexity Aug 23, 2012
    @pitrou
    Copy link
    Member

    pitrou commented Aug 23, 2012

    Here is the patch (with the old ones removed).

    Note that the old code mishandled the case where _PyBytes_Resize()
    failed by assuming that the old bytes object would still be valid.

    I have assumed that stream psuedo-files will never claim to have a
    size greater than zero. The code will still work if this assumption
    is false.

    I haven't measured under Windows, but this looks ok to me.

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Aug 24, 2012

    After playing with the patch on Linux it seems that Linux much prefers the realloc() scheme to the list-of-chunks scheme.

    This new patch only does list-of-chunks on Windows.

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Aug 24, 2012

    Attached is an alternative benchmark program which does not use subprocess.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 24, 2012

    After playing with the patch on Linux it seems that Linux much prefers
    the realloc() scheme to the list-of-chunks scheme.

    What about the method call overhead in RawIO.readall(), and the
    different progression of buffer sizes? (the realloc scheme uses larger
    and larger read() sizes, while RawIO.readall() uses a constant read()
    size).

    By the way, not every non-Windows OS is Linux, so the patch is wrong.

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Aug 24, 2012

    What about the method call overhead in RawIO.readall(), and the
    different progression of buffer sizes? (the realloc scheme uses larger
    and larger read() sizes, while RawIO.readall() uses a constant read()
    size).

    For this benchmark the call overhead does not seem to be noticeable, and using larger or adaptive read buffers does not seem to help either. (I have tried both on Linux.)

    By the way, not every non-Windows OS is Linux, so the patch is wrong.

    Wrong in the sense of not necessarily optimal for unknown platforms? Well, the patch retains the old (intended) behaviour on other platforms, so I would call that conservative rather than wrong. Are you suggesting switching behaviour depending on whether some macro is defined?

    64-bit Linux with current patch (and using new benchmark):
    amount = 1 MB; time taken = 0.003 secs; rate = 317.22 MB/s
    amount = 2 MB; time taken = 0.007 secs; rate = 283.74 MB/s
    amount = 4 MB; time taken = 0.011 secs; rate = 359.58 MB/s
    amount = 8 MB; time taken = 0.020 secs; rate = 395.58 MB/s
    amount = 16 MB; time taken = 0.030 secs; rate = 528.18 MB/s
    amount = 32 MB; time taken = 0.051 secs; rate = 627.72 MB/s
    amount = 64 MB; time taken = 0.088 secs; rate = 726.36 MB/s
    amount = 128 MB; time taken = 0.133 secs; rate = 960.23 MB/s
    amount = 256 MB; time taken = 0.258 secs; rate = 992.32 MB/s
    amount = 512 MB; time taken = 0.482 secs; rate = 1062.30 MB/s

    On 64-bit Linux with previous patch:
    amount = 1 MB; time taken = 0.006 secs; rate = 158.07 MB/s
    amount = 2 MB; time taken = 0.011 secs; rate = 177.23 MB/s
    amount = 4 MB; time taken = 0.024 secs; rate = 169.32 MB/s
    amount = 8 MB; time taken = 0.047 secs; rate = 170.39 MB/s
    amount = 16 MB; time taken = 0.098 secs; rate = 163.65 MB/s
    amount = 32 MB; time taken = 0.220 secs; rate = 145.19 MB/s
    amount = 64 MB; time taken = 0.253 secs; rate = 253.32 MB/s
    amount = 128 MB; time taken = 0.724 secs; rate = 176.80 MB/s
    amount = 256 MB; time taken = 0.874 secs; rate = 293.02 MB/s
    amount = 512 MB; time taken = 2.292 secs; rate = 223.38 MB/s

    @pitrou
    Copy link
    Member

    pitrou commented Aug 24, 2012

    For this benchmark the call overhead does not seem to be noticeable,
    and using larger or adaptive read buffers does not seem to help
    either. (I have tried both on Linux.)

    Ok, thank you.

    > By the way, not every non-Windows OS is Linux, so the patch is wrong.

    Wrong in the sense of not necessarily optimal for unknown platforms?
    Well, the patch retains the old (intended) behaviour on other
    platforms, so I would call that conservative rather than wrong.

    Hmm, you are right, there is no regression indeed.
    I guess I don't like very much the idea of switching code paths based on
    the platform for pure optimization reasons, but in this case it seems
    useful (and simple enough).

    Are you suggesting switching behaviour depending on whether some macro
    is defined?

    No, that would definitely be overkill.

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented May 16, 2013

    I have done an updated patch. It no longer special cases Windows, so realloc() is always used for enlarging the buffer (except when fstat() is missing).

    Antoine, do you think this is ready to commit?

    @pitrou
    Copy link
    Member

    pitrou commented May 16, 2013

    I posted a couple of review comments.

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented May 17, 2013

    Updated patch adressing Antoine's comments.

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented May 17, 2013

    New patch.

    @pitrou
    Copy link
    Member

    pitrou commented May 17, 2013

    Looks fine to me.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 17, 2013

    New changeset e4c303d23d01 by Richard Oudkerk in branch 'default':
    Issue bpo-15758: Fix FileIO.readall() so it no longer has O(n**2) complexity.
    http://hg.python.org/cpython/rev/e4c303d23d01

    @sbt sbt mannequin closed this as completed May 18, 2013
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants