New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileIO.readall() has worst case O(n^2) complexity #59962
Comments
Piping significant amounts of data through a subprocess using Popen.communicate() is crazily slow on Windows. The attached program just pushes data through mingw's cat.exe. Python 3.3: Python 2.7: For Python 3.3 this looks like O(n^2) complexity to me. 2.7 is better but still struggles for large amounts. Changing Popen._readerthread() to read in chunks rather than using FileIO.readall() produces a huge speed up: Python 3.3 with patch: Maybe FileIO.readall() should do something similar for files whose size cannot be determined by stat(). |
Yes, I think FileIO.readall() should be fixed rather than avoided. |
RawIOBase.readall() does the sensible thing already. Maybe FileIO should be allowed to inherit it. The alternative patch (which probably only works for raw unbuffered case) diff -r ca54c27a9045 Lib/subprocess.py def _readerthread(self, fh, buffer):
- buffer.append(fh.read())
+ buffer.append(io.RawIOBase.readall(fh))
fh.close() produces amount = 1 MB; time taken = 0.01 secs; rate = 71.42 MB/s
amount = 2 MB; time taken = 0.02 secs; rate = 83.33 MB/s
amount = 4 MB; time taken = 0.04 secs; rate = 105.26 MB/s
amount = 8 MB; time taken = 0.06 secs; rate = 135.59 MB/s
amount = 16 MB; time taken = 0.09 secs; rate = 170.20 MB/s
amount = 32 MB; time taken = 0.17 secs; rate = 190.46 MB/s
amount = 64 MB; time taken = 0.32 secs; rate = 202.52 MB/s
amount = 128 MB; time taken = 0.61 secs; rate = 211.56 MB/s |
FileIO.readall() already has an overallocation mechanism which should yield linear complexity. Perhaps it needs to be tweaked a bit? By the way, your results are bit weird. Why does the data rate increase with the transfered amount? |
I think it needs a bit more than a tweak;-) Looks like it increases by 12.5% at time. Does not seem to work as intended though.
The rates are completely unreliable for those benchmarks which take less than 0.1 seconds. They level off after 0.1 secs when the overheads become insignificant. |
In each loop before calling read() the buffer size is recalculated based on the amount of space used, causing a realloc *regardless* of how much empty space is left in the buffer. And each read is only producing a smallish chunk (5120 bytes). So assuming realloc fails to do an inplace realloc (and Windows is reputedly very bad at that) the amount of data copied will be O(n^2). |
The attached patch (readall-resize.patch) makes the resizes only happen when the buffer is full. It produces amount = 1 MB; time taken = 0.02 secs; rate = 64.10 MB/s
amount = 2 MB; time taken = 0.03 secs; rate = 64.10 MB/s
amount = 4 MB; time taken = 0.06 secs; rate = 64.10 MB/s
amount = 8 MB; time taken = 0.08 secs; rate = 102.56 MB/s
amount = 16 MB; time taken = 0.17 secs; rate = 93.24 MB/s
amount = 32 MB; time taken = 0.31 secs; rate = 102.56 MB/s
amount = 64 MB; time taken = 0.70 secs; rate = 91.17 MB/s
amount = 128 MB; time taken = 1.81 secs; rate = 70.73 MB/s |
Alternative patch (readall-chunks.patch) which delegates to RawIOBase.readall() if the file cannot be stat-ed or appears to have size zero. amount = 1 MB; time taken = 0.02 secs; rate = 64.10 MB/s
amount = 2 MB; time taken = 0.02 secs; rate = 128.21 MB/s
amount = 4 MB; time taken = 0.03 secs; rate = 128.20 MB/s
amount = 8 MB; time taken = 0.05 secs; rate = 170.94 MB/s
amount = 16 MB; time taken = 0.09 secs; rate = 170.94 MB/s
amount = 32 MB; time taken = 0.16 secs; rate = 205.13 MB/s
amount = 64 MB; time taken = 0.30 secs; rate = 215.92 MB/s
amount = 128 MB; time taken = 0.58 secs; rate = 221.76 MB/s |
It seems we could do both (fix the resizing logic, and fallback on |
Combined patch attached. |
If new_buffersize() is only used with HAVE_FSTAT, then its code can probably be simplified. |
I'm lost, there are too much patches. Could you please remove old/useless patches? Patching the subprocess module is not a good solution: if there is an issue in the io module, the io module should be fixed, and so this patch can be removed. |
Here is the patch (with the old ones removed). Note that the old code mishandled the case where _PyBytes_Resize() failed by assuming that the old bytes object would still be valid. I have assumed that stream psuedo-files will never claim to have a size greater than zero. The code will still work if this assumption is false. |
I haven't measured under Windows, but this looks ok to me. |
After playing with the patch on Linux it seems that Linux much prefers the realloc() scheme to the list-of-chunks scheme. This new patch only does list-of-chunks on Windows. |
Attached is an alternative benchmark program which does not use subprocess. |
What about the method call overhead in RawIO.readall(), and the By the way, not every non-Windows OS is Linux, so the patch is wrong. |
For this benchmark the call overhead does not seem to be noticeable, and using larger or adaptive read buffers does not seem to help either. (I have tried both on Linux.)
Wrong in the sense of not necessarily optimal for unknown platforms? Well, the patch retains the old (intended) behaviour on other platforms, so I would call that conservative rather than wrong. Are you suggesting switching behaviour depending on whether some macro is defined? 64-bit Linux with current patch (and using new benchmark): On 64-bit Linux with previous patch: |
Ok, thank you.
Hmm, you are right, there is no regression indeed.
No, that would definitely be overkill. |
I have done an updated patch. It no longer special cases Windows, so realloc() is always used for enlarging the buffer (except when fstat() is missing). Antoine, do you think this is ready to commit? |
I posted a couple of review comments. |
Updated patch adressing Antoine's comments. |
New patch. |
Looks fine to me. |
New changeset e4c303d23d01 by Richard Oudkerk in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: