Message 135420 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	benjamin.peterson, daniel.urban, jcon, pitrou, stutzbach
Date	2011-05-07.09:01:26
SpamBayes Score	2.4501655e-06
Marked as misclassified	No
Message-id	<1304758887.2.0.143952046493.issue9971@psf.upfronthosting.co.za>
In-reply-to

Content
Oops... It hadn't jumped at me earlier, but the patch is actually problematic performance-wise. The reason is that it doesn't buffer data at all, so small readintos become slower (they have to go through raw I/O every time): $ ./python -m timeit -s "f=open('LICENSE', 'rb'); b = bytearray(4)" \ "f.seek(0)" "while f.readinto(b): pass" -> without patch: 2.53 msec per loop -> with patch: 3.37 msec per loop $ ./python -m timeit -s "f=open('LICENSE', 'rb'); b = bytearray(128)" \ "f.seek(0)" "while f.readinto(b): pass" -> without patch: 90.3 usec per loop -> with patch: 103 usec per loop The patch does make large reads faster, as expected: $ ./python -m timeit -s "f=open('LICENSE', 'rb'); b = bytearray(4096)" \ "f.seek(0)" "while f.readinto(b): pass" -> without patch: 13.2 usec per loop -> with patch: 6.71 usec per loop (that's a good reminder for the future: when optimizing something, always try to measure the "improvement" :-)) One solution would be to refactor _bufferedreader_read_generic() to take an existing buffer, and use that.

Oops... It hadn't jumped at me earlier, but the patch is actually problematic performance-wise. The reason is that it doesn't buffer data at all, so small readintos become slower (they have to go through raw I/O every time):

$ ./python -m timeit -s "f=open('LICENSE', 'rb'); b = bytearray(4)" \
  "f.seek(0)" "while f.readinto(b): pass"
-> without patch: 2.53 msec per loop
-> with patch: 3.37 msec per loop

$ ./python -m timeit -s "f=open('LICENSE', 'rb'); b = bytearray(128)" \
  "f.seek(0)" "while f.readinto(b): pass"
-> without patch: 90.3 usec per loop
-> with patch: 103 usec per loop

The patch does make large reads faster, as expected:

$ ./python -m timeit -s "f=open('LICENSE', 'rb'); b = bytearray(4096)" \
  "f.seek(0)" "while f.readinto(b): pass"
-> without patch: 13.2 usec per loop
-> with patch: 6.71 usec per loop

(that's a good reminder for the future: when optimizing something, always try to measure the "improvement" :-))

One solution would be to refactor _bufferedreader_read_generic() to take an existing buffer, and use that.

History
Date	User	Action	Args
2011-05-07 09:01:27	pitrou	set	recipients: + pitrou, benjamin.peterson, stutzbach, daniel.urban, jcon
2011-05-07 09:01:27	pitrou	set	messageid: <1304758887.2.0.143952046493.issue9971@psf.upfronthosting.co.za>
2011-05-07 09:01:26	pitrou	link	issue9971 messages
2011-05-07 09:01:26	pitrou	create