Issue2523
Created on 2008-03-31 18:11 by pitrou, last changed 2008-05-08 14:35 by pitrou.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | Remove |
| binaryio1.patch | pitrou, 2008-05-08 02:14 | |||
| binaryio2.patch | pitrou, 2008-05-08 14:35 | |||
| Messages | |||
|---|---|---|---|
| msg64788 (view) | Author: Antoine Pitrou (pitrou) | Date: 2008-03-31 18:11 | |
In py3k, buffered binary IO can be quadratic when e.g. reading a whole file.
This is a small test on 50KB, 100KB and 200KB files:
-> py3k with buffering:
./python -m timeit -s "f = open('50KB', 'rb')" "f.seek(0); f.read()"
1000 loops, best of 3: 286 usec per loop
./python -m timeit -s "f = open('100KB', 'rb')" "f.seek(0); f.read()"
1000 loops, best of 3: 1.07 msec per loop
./python -m timeit -s "f = open('200KB', 'rb')" "f.seek(0); f.read()"
100 loops, best of 3: 4.85 msec per loop
-> py3k without buffering (just the raw FileIO layer):
./python -m timeit -s "f = open('50KB', 'rb', buffering=0)" "f.seek(0);
f.read()"
10000 loops, best of 3: 46 usec per loop
./python -m timeit -s "f = open('100KB', 'rb', buffering=0)" "f.seek(0);
f.read()"
10000 loops, best of 3: 88.7 usec per loop
./python -m timeit -s "f = open('200KB', 'rb', buffering=0)" "f.seek(0);
f.read()"
10000 loops, best of 3: 156 usec per loop
-> for comparison, Python 2.5:
python -m timeit -s "f = open('50KB', 'rb')" "f.seek(0); f.read()"
10000 loops, best of 3: 34.4 usec per loop
python -m timeit -s "f = open('100KB', 'rb')" "f.seek(0); f.read()"
10000 loops, best of 3: 62.3 usec per loop
python -m timeit -s "f = open('200KB', 'rb')" "f.seek(0); f.read()"
10000 loops, best of 3: 119 usec per loop
I'm posting this issue as a reminder, but perhaps someone is already
working on this, or the goal is to translate it to C ultimately?
|
|||
| msg65236 (view) | Author: Antoine Pitrou (pitrou) | Date: 2008-04-09 11:21 | |
By the way, a simple way to fix it would be to use a native BytesIO object (as provided by Alexandre's patch in #1751) rather than a str object for the underlying buffer. |
|||
| msg66390 (view) | Author: Antoine Pitrou (pitrou) | Date: 2008-05-08 02:14 | |
Here is a pure Python patch removing the quadratic behaviour and trying
to make read operations generally faster.
Here are some numbers:
./python -m timeit -s "f = open('50KB', 'rb')" "f.seek(0)" "while
f.read(11): pass"
-> py3k without patch: 23.6 msec per loop
-> py3k with patch: 14.5 msec per loop
-> Python 2.5: 4.72 msec per loop
./python -m timeit -s "f = open('50KB', 'rb')" "f.seek(0); f.read()"
-> py3k without patch: 284 usec per loop
-> py3k with patch: 90.1 usec per loop
-> Python 2.5: 33.8 usec per loop
./python -m timeit -s "f = open('100KB', 'rb')" "f.seek(0); f.read()"
-> py3k without patch: 828 usec per loop
-> py3k with patch: 142 usec per loop
-> Python 2.5: 62.5 usec per loop
./python -m timeit -s "f = open('200KB', 'rb')" "f.seek(0); f.read()"
-> py3k without patch: 3.67 msec per loop
-> py3k with patch: 375 usec per loop
-> Python 2.5: 131 usec per loop
And, for the record, with a 10MB file:
./python -m timeit -s "f = open('10MB', 'rb')" "f.seek(0); f.read()"
-> py3k without patch: still running after more than one minute, gave up
-> py3k with patch: 38.6 msec per loop
-> Python 2.5: 20.4 msec per loop
|
|||
| msg66393 (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) | Date: 2008-05-08 02:48 | |
I see that the code is still using the immutable bytes object for its buffer (which forces Python to create a new buffer every time its modified). Also, I think it worthwhile to check if using a pre-allocated bytearray object (i.e., bytearray(buffer_size) where `buffer_size` is an integer) would have any performance benefits. |
|||
| msg66396 (view) | Author: Antoine Pitrou (pitrou) | Date: 2008-05-08 03:12 | |
Hi Alexandre, I first tried to use a (non-preallocated) bytearray object and, after trying several optimization schemes, I found out that the best one worked as well with an immutable bytes object :) I also found out that the bytes <-> bytearray conversion costs can be noticeable in some benchmarks. The internal buffer is rarely reallocated because the current offset inside it is remembered instead; also, when reading bytes from the underlying unbuffered stream, a list of bytes objects is accumulated and then joined at the end. I think a preallocated bytearray would not make a lot of sense since we can't readinto() an arbitrary position, so we still have a memory copy from the bytes object returned by raw.read() to the bytearray buffer, and then when returning the result to the user as a bytes object we have another memory copy. In other words each read byte is copied twice more. Of course, if this code was rewritten in C, different compromises would be possible. cheers Antoine. |
|||
| msg66417 (view) | Author: Antoine Pitrou (pitrou) | Date: 2008-05-08 14:35 | |
Some code relies on -1 being usable as the default value for read() (instead of None), this new patch conforms to this expectation. It fixes some failures in test_mailbox. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2008-05-08 14:35:36 | pitrou | set | files:
+ binaryio2.patch messages: + msg66417 |
| 2008-05-08 03:12:46 | pitrou | set | messages: + msg66396 |
| 2008-05-08 02:48:48 | alexandre.vassalotti | set | messages: + msg66393 |
| 2008-05-08 02:14:55 | pitrou | set | files:
+ binaryio1.patch keywords: + patch messages: + msg66390 |
| 2008-05-06 19:28:51 | alexandre.vassalotti | set | nosy: + alexandre.vassalotti |
| 2008-04-09 11:21:25 | pitrou | set | messages: + msg65236 |
| 2008-03-31 18:11:24 | pitrou | create | |