FileIO.readall() has worst case O(n^2) complexity #59962

sbt · 2012-08-21T22:25:10Z

BPO	15758
Nosy	@pitrou, @vstinner
Files	readall-benchmark.py readall.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2013-05-18.14:19:37.296>
created_at = <Date 2012-08-21.22:25:09.945>
labels = ['library', 'performance']
title = 'FileIO.readall() has worst case O(n^2) complexity'
updated_at = <Date 2013-05-18.14:19:37.295>
user = 'https://bugs.python.org/sbt'

bugs.python.org fields:

activity = <Date 2013-05-18.14:19:37.295>
actor = 'sbt'
assignee = 'none'
closed = True
closed_date = <Date 2013-05-18.14:19:37.296>
closer = 'sbt'
components = ['Library (Lib)']
creation = <Date 2012-08-21.22:25:09.945>
creator = 'sbt'
dependencies = []
files = ['26986', '30295']
hgrepos = []
issue_num = 15758
keywords = ['patch']
message_count = 25.0
messages = ['168813', '168819', '168825', '168826', '168829', '168862', '168863', '168865', '168867', '168876', '168882', '168884', '168944', '168945', '169020', '169021', '169022', '169044', '169047', '189384', '189408', '189450', '189459', '189487', '189489']
nosy_count = 5.0
nosy_names = ['pitrou', 'vstinner', 'schmir', 'python-dev', 'sbt']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue15758'
versions = ['Python 3.4']

sbt · 2012-08-21T22:25:09Z

Piping significant amounts of data through a subprocess using Popen.communicate() is crazily slow on Windows.

The attached program just pushes data through mingw's cat.exe.

Python 3.3:
amount = 1 MB; time taken = 0.07 secs; rate = 13.51 MB/s
amount = 2 MB; time taken = 0.31 secs; rate = 6.51 MB/s
amount = 4 MB; time taken = 1.30 secs; rate = 3.08 MB/s
amount = 8 MB; time taken = 5.43 secs; rate = 1.47 MB/s
amount = 16 MB; time taken = 21.64 secs; rate = 0.74 MB/s
amount = 32 MB; time taken = 87.36 secs; rate = 0.37 MB/s

Python 2.7:
amount = 1 MB; time taken = 0.02 secs; rate = 66.67 MB/s
amount = 2 MB; time taken = 0.03 secs; rate = 68.97 MB/s
amount = 4 MB; time taken = 0.05 secs; rate = 76.92 MB/s
amount = 8 MB; time taken = 0.10 secs; rate = 82.47 MB/s
amount = 16 MB; time taken = 0.27 secs; rate = 60.38 MB/s
amount = 32 MB; time taken = 0.88 secs; rate = 36.36 MB/s
amount = 64 MB; time taken = 3.20 secs; rate = 20.03 MB/s
amount = 128 MB; time taken = 12.36 secs; rate = 10.35 MB/s

For Python 3.3 this looks like O(n^2) complexity to me. 2.7 is better but still struggles for large amounts.

Changing Popen._readerthread() to read in chunks rather than using FileIO.readall() produces a huge speed up:

Python 3.3 with patch:
amount = 1 MB; time taken = 0.01 secs; rate = 76.92 MB/s
amount = 2 MB; time taken = 0.03 secs; rate = 76.92 MB/s
amount = 4 MB; time taken = 0.04 secs; rate = 111.10 MB/s
amount = 8 MB; time taken = 0.05 secs; rate = 148.14 MB/s
amount = 16 MB; time taken = 0.10 secs; rate = 156.85 MB/s
amount = 32 MB; time taken = 0.16 secs; rate = 198.75 MB/s
amount = 64 MB; time taken = 0.31 secs; rate = 205.78 MB/s
amount = 128 MB; time taken = 0.61 secs; rate = 209.82 MB/s

Maybe FileIO.readall() should do something similar for files whose size cannot be determined by stat().

pitrou · 2012-08-21T22:59:43Z

Yes, I think FileIO.readall() should be fixed rather than avoided.

sbt · 2012-08-21T23:45:56Z

RawIOBase.readall() does the sensible thing already. Maybe FileIO should be allowed to inherit it.

The alternative patch (which probably only works for raw unbuffered case)

diff -r ca54c27a9045 Lib/subprocess.py
--- a/Lib/subprocess.py Tue Aug 21 14:54:22 2012 +0100
+++ b/Lib/subprocess.py Wed Aug 22 00:39:32 2012 +0100
@@ -1152,7 +1152,7 @@

         def _readerthread(self, fh, buffer):
-            buffer.append(fh.read())
+            buffer.append(io.RawIOBase.readall(fh))
             fh.close()

produces

amount = 1 MB; time taken = 0.01 secs; rate = 71.42 MB/s
amount = 2 MB; time taken = 0.02 secs; rate = 83.33 MB/s
amount = 4 MB; time taken = 0.04 secs; rate = 105.26 MB/s
amount = 8 MB; time taken = 0.06 secs; rate = 135.59 MB/s
amount = 16 MB; time taken = 0.09 secs; rate = 170.20 MB/s
amount = 32 MB; time taken = 0.17 secs; rate = 190.46 MB/s
amount = 64 MB; time taken = 0.32 secs; rate = 202.52 MB/s
amount = 128 MB; time taken = 0.61 secs; rate = 211.56 MB/s

pitrou · 2012-08-21T23:58:42Z

FileIO.readall() already has an overallocation mechanism which should yield linear complexity. Perhaps it needs to be tweaked a bit?
(look at new_buffersize in Modules/_io/fileio.c)

By the way, your results are bit weird. Why does the data rate increase with the transfered amount?

sbt · 2012-08-22T00:09:03Z

FileIO.readall() already has an overallocation mechanism which should
yield linear complexity. Perhaps it needs to be tweaked a bit?
(look at new_buffersize in Modules/_io/fileio.c)

I think it needs a bit more than a tweak;-) Looks like it increases by 12.5% at time. Does not seem to work as intended though.

By the way, your results are bit weird. Why does the data rate increase
with the transfered amount?

The rates are completely unreliable for those benchmarks which take less than 0.1 seconds. They level off after 0.1 secs when the overheads become insignificant.

sbt · 2012-08-22T09:41:32Z

In each loop before calling read() the buffer size is recalculated based on the amount of space used, causing a realloc *regardless* of how much empty space is left in the buffer. And each read is only producing a smallish chunk (5120 bytes).

So assuming realloc fails to do an inplace realloc (and Windows is reputedly very bad at that) the amount of data copied will be O(n^2).

sbt · 2012-08-22T09:58:02Z

The attached patch (readall-resize.patch) makes the resizes only happen when the buffer is full. It produces

amount = 1 MB; time taken = 0.02 secs; rate = 64.10 MB/s
amount = 2 MB; time taken = 0.03 secs; rate = 64.10 MB/s
amount = 4 MB; time taken = 0.06 secs; rate = 64.10 MB/s
amount = 8 MB; time taken = 0.08 secs; rate = 102.56 MB/s
amount = 16 MB; time taken = 0.17 secs; rate = 93.24 MB/s
amount = 32 MB; time taken = 0.31 secs; rate = 102.56 MB/s
amount = 64 MB; time taken = 0.70 secs; rate = 91.17 MB/s
amount = 128 MB; time taken = 1.81 secs; rate = 70.73 MB/s

sbt · 2012-08-22T10:08:11Z

Alternative patch (readall-chunks.patch) which delegates to RawIOBase.readall() if the file cannot be stat-ed or appears to have size zero.

amount = 1 MB; time taken = 0.02 secs; rate = 64.10 MB/s
amount = 2 MB; time taken = 0.02 secs; rate = 128.21 MB/s
amount = 4 MB; time taken = 0.03 secs; rate = 128.20 MB/s
amount = 8 MB; time taken = 0.05 secs; rate = 170.94 MB/s
amount = 16 MB; time taken = 0.09 secs; rate = 170.94 MB/s
amount = 32 MB; time taken = 0.16 secs; rate = 205.13 MB/s
amount = 64 MB; time taken = 0.30 secs; rate = 215.92 MB/s
amount = 128 MB; time taken = 0.58 secs; rate = 221.76 MB/s

pitrou · 2012-08-22T10:57:00Z

Alternative patch (readall-chunks.patch) which delegates to
RawIOBase.readall() if the file cannot be stat-ed or appears to have
size zero.

It seems we could do both (fix the resizing logic, and fallback on
chunks if the size is unknown).

sbt · 2012-08-22T12:39:35Z

It seems we could do both (fix the resizing logic, and fallback on
chunks if the size is unknown).

Combined patch attached.

pitrou · 2012-08-22T12:57:51Z

If new_buffersize() is only used with HAVE_FSTAT, then its code can probably be simplified.

vstinner · 2012-08-22T14:05:29Z

I'm lost, there are too much patches. Could you please remove old/useless patches? Patching the subprocess module is not a good solution: if there is an issue in the io module, the io module should be fixed, and so this patch can be removed.

sbt · 2012-08-23T14:08:00Z

Here is the patch (with the old ones removed).

Note that the old code mishandled the case where _PyBytes_Resize() failed by assuming that the old bytes object would still be valid.

I have assumed that stream psuedo-files will never claim to have a size greater than zero. The code will still work if this assumption is false.

pitrou · 2012-08-23T14:50:11Z

Here is the patch (with the old ones removed).

Note that the old code mishandled the case where _PyBytes_Resize()
failed by assuming that the old bytes object would still be valid.

I have assumed that stream psuedo-files will never claim to have a
size greater than zero. The code will still work if this assumption
is false.

I haven't measured under Windows, but this looks ok to me.

sbt · 2012-08-24T14:00:35Z

After playing with the patch on Linux it seems that Linux much prefers the realloc() scheme to the list-of-chunks scheme.

This new patch only does list-of-chunks on Windows.

sbt · 2012-08-24T14:02:16Z

Attached is an alternative benchmark program which does not use subprocess.

pitrou · 2012-08-24T14:11:50Z

After playing with the patch on Linux it seems that Linux much prefers
the realloc() scheme to the list-of-chunks scheme.

What about the method call overhead in RawIO.readall(), and the
different progression of buffer sizes? (the realloc scheme uses larger
and larger read() sizes, while RawIO.readall() uses a constant read()
size).

By the way, not every non-Windows OS is Linux, so the patch is wrong.

sbt · 2012-08-24T16:42:37Z

What about the method call overhead in RawIO.readall(), and the
different progression of buffer sizes? (the realloc scheme uses larger
and larger read() sizes, while RawIO.readall() uses a constant read()
size).

For this benchmark the call overhead does not seem to be noticeable, and using larger or adaptive read buffers does not seem to help either. (I have tried both on Linux.)

By the way, not every non-Windows OS is Linux, so the patch is wrong.

Wrong in the sense of not necessarily optimal for unknown platforms? Well, the patch retains the old (intended) behaviour on other platforms, so I would call that conservative rather than wrong. Are you suggesting switching behaviour depending on whether some macro is defined?

64-bit Linux with current patch (and using new benchmark):
amount = 1 MB; time taken = 0.003 secs; rate = 317.22 MB/s
amount = 2 MB; time taken = 0.007 secs; rate = 283.74 MB/s
amount = 4 MB; time taken = 0.011 secs; rate = 359.58 MB/s
amount = 8 MB; time taken = 0.020 secs; rate = 395.58 MB/s
amount = 16 MB; time taken = 0.030 secs; rate = 528.18 MB/s
amount = 32 MB; time taken = 0.051 secs; rate = 627.72 MB/s
amount = 64 MB; time taken = 0.088 secs; rate = 726.36 MB/s
amount = 128 MB; time taken = 0.133 secs; rate = 960.23 MB/s
amount = 256 MB; time taken = 0.258 secs; rate = 992.32 MB/s
amount = 512 MB; time taken = 0.482 secs; rate = 1062.30 MB/s

On 64-bit Linux with previous patch:
amount = 1 MB; time taken = 0.006 secs; rate = 158.07 MB/s
amount = 2 MB; time taken = 0.011 secs; rate = 177.23 MB/s
amount = 4 MB; time taken = 0.024 secs; rate = 169.32 MB/s
amount = 8 MB; time taken = 0.047 secs; rate = 170.39 MB/s
amount = 16 MB; time taken = 0.098 secs; rate = 163.65 MB/s
amount = 32 MB; time taken = 0.220 secs; rate = 145.19 MB/s
amount = 64 MB; time taken = 0.253 secs; rate = 253.32 MB/s
amount = 128 MB; time taken = 0.724 secs; rate = 176.80 MB/s
amount = 256 MB; time taken = 0.874 secs; rate = 293.02 MB/s
amount = 512 MB; time taken = 2.292 secs; rate = 223.38 MB/s

pitrou · 2012-08-24T16:49:10Z

For this benchmark the call overhead does not seem to be noticeable,
and using larger or adaptive read buffers does not seem to help
either. (I have tried both on Linux.)

Ok, thank you.

> By the way, not every non-Windows OS is Linux, so the patch is wrong.

Wrong in the sense of not necessarily optimal for unknown platforms?
Well, the patch retains the old (intended) behaviour on other
platforms, so I would call that conservative rather than wrong.

Hmm, you are right, there is no regression indeed.
I guess I don't like very much the idea of switching code paths based on
the platform for pure optimization reasons, but in this case it seems
useful (and simple enough).

Are you suggesting switching behaviour depending on whether some macro
is defined?

No, that would definitely be overkill.

sbt · 2013-05-16T15:21:03Z

I have done an updated patch. It no longer special cases Windows, so realloc() is always used for enlarging the buffer (except when fstat() is missing).

Antoine, do you think this is ready to commit?

pitrou · 2013-05-16T20:31:00Z

I posted a couple of review comments.

sbt · 2013-05-17T12:34:12Z

Updated patch adressing Antoine's comments.

sbt · 2013-05-17T14:55:08Z

New patch.

pitrou · 2013-05-17T21:54:30Z

Looks fine to me.

python-dev · 2013-05-17T22:36:44Z

New changeset e4c303d23d01 by Richard Oudkerk in branch 'default':
Issue bpo-15758: Fix FileIO.readall() so it no longer has O(n**2) complexity.
http://hg.python.org/cpython/rev/e4c303d23d01

sbt mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Aug 21, 2012

sbt mannequin changed the title ~~500x speed up for Popen.communicate() on Windows~~ FileIO.readall() has worst case O(n^2) complexity Aug 23, 2012

sbt mannequin closed this as completed May 18, 2013

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FileIO.readall() has worst case O(n^2) complexity #59962

FileIO.readall() has worst case O(n^2) complexity #59962

sbt mannequin commented Aug 21, 2012

sbt mannequin commented Aug 21, 2012

pitrou commented Aug 21, 2012

sbt mannequin commented Aug 21, 2012

pitrou commented Aug 21, 2012

sbt mannequin commented Aug 22, 2012

sbt mannequin commented Aug 22, 2012

sbt mannequin commented Aug 22, 2012

sbt mannequin commented Aug 22, 2012

pitrou commented Aug 22, 2012

sbt mannequin commented Aug 22, 2012

pitrou commented Aug 22, 2012

vstinner commented Aug 22, 2012

sbt mannequin commented Aug 23, 2012

pitrou commented Aug 23, 2012

sbt mannequin commented Aug 24, 2012

sbt mannequin commented Aug 24, 2012

pitrou commented Aug 24, 2012

sbt mannequin commented Aug 24, 2012

pitrou commented Aug 24, 2012

sbt mannequin commented May 16, 2013

pitrou commented May 16, 2013

sbt mannequin commented May 17, 2013

sbt mannequin commented May 17, 2013

pitrou commented May 17, 2013

python-dev mannequin commented May 17, 2013

FileIO.readall() has worst case O(n^2) complexity #59962

FileIO.readall() has worst case O(n^2) complexity #59962

Comments

sbt mannequin commented Aug 21, 2012

sbt mannequin commented Aug 21, 2012

pitrou commented Aug 21, 2012

sbt mannequin commented Aug 21, 2012

pitrou commented Aug 21, 2012

sbt mannequin commented Aug 22, 2012

sbt mannequin commented Aug 22, 2012

sbt mannequin commented Aug 22, 2012

sbt mannequin commented Aug 22, 2012

pitrou commented Aug 22, 2012

sbt mannequin commented Aug 22, 2012

pitrou commented Aug 22, 2012

vstinner commented Aug 22, 2012

sbt mannequin commented Aug 23, 2012

pitrou commented Aug 23, 2012

sbt mannequin commented Aug 24, 2012

sbt mannequin commented Aug 24, 2012

pitrou commented Aug 24, 2012

sbt mannequin commented Aug 24, 2012

pitrou commented Aug 24, 2012

sbt mannequin commented May 16, 2013

pitrou commented May 16, 2013

sbt mannequin commented May 17, 2013

sbt mannequin commented May 17, 2013

pitrou commented May 17, 2013

python-dev mannequin commented May 17, 2013