This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: poor cStringIO.StringO seek performance
Type: performance Stage:
Components: Versions: Python 2.7
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: fdrake Nosy List: boogenhagn, fdrake, pitrou, r.david.murray
Priority: normal Keywords: patch

Created on 2010-10-07 17:37 by boogenhagn, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
cStringIO.diff boogenhagn, 2010-10-07 17:37 cStringIO patch review
Messages (11)
msg118123 - (view) Author: Patrick Strawderman (boogenhagn) Date: 2010-10-07 17:37
cStringIO.StringO's seek method has O(n) characteristics in certain,
albeit pathological, cases, while the pure Python implementation and
cStringIO.StringI's seek methods both execute in constant time in all cases.

When the file offset is set n bytes beyond the end of actual data,
the gap is filled in with n bytes in cStringIO.StringO's seek method.

however, POSIX states that reads of data in the gap will return null bytes
only if a subsequent write has taken place, so filling in the gap is not
required at the time of the seek.

This patch for 2.7 corrects the behavior by unifying StringO and StringI's
seek methods, and moving the writing of null bytes to StringO's write
method.  There may be a more elegant way to write this, I don't know.
I believe this issue affects Python 3 as well, though I have yet to
test it.

NOTE: Perhaps this seems like an extreme edge case not worthy of a fix, but
this actually caused problems for us when parsing images with malformed
EXIF data; a web request for uploading such a photo was taking on the order
of 15 minutes.  When we stopped using cStringIO.StringO, it took seconds.
msg118124 - (view) Author: Patrick Strawderman (boogenhagn) Date: 2010-10-07 17:39
The second sentence should have said "the gap is filled in with n null bytes"
msg118126 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-07 18:26
I'm changing the versions to just 2.7 (though I'm not sure this can be considered a bug fix), since StringIO is reimplemented as part of io in 3.x.
msg118147 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-07 22:31
I don't think there's much point in fixing this. 2.7 users can use io.BytesIO, which is a fast type implemented in C.
msg118215 - (view) Author: Patrick Strawderman (boogenhagn) Date: 2010-10-08 18:55
Fair enough, but there is a great deal of existing code that already 
uses cStringIO.
msg118353 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-10-11 12:53
Causing perfectly good Python 2 applications to degrade in performance is bad, even if something else is available.

This should be fixed as a regression.
msg118354 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-11 12:57
> This should be fixed as a regression.

As far as I understand, this is not a regression. I don't think the cStringIO code has changed in years.
msg118355 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-10-11 13:01
Ok, reading more carefully, it's not a regression.  But it's certainly a bug, and should be fixed.
msg118356 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-11 13:04
> Ok, reading more carefully, it's not a regression.  But it's certainly
> a bug, and should be fixed.

Right. The patch looks straightforward, but I'm not familiar with the
cStringIO code. Could you take a look?
msg118357 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-10-11 13:06
Assigning to myself for review.
msg118382 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-10-11 19:13
Committed with minor changes in r85366 (release27-maint branch).
History
Date User Action Args
2022-04-11 14:57:07adminsetgithub: 54254
2010-10-11 19:13:56fdrakesetstatus: open -> closed
keywords: - needs review
resolution: accepted
messages: + msg118382
2010-10-11 13:06:12fdrakesetkeywords: + needs review
assignee: fdrake
messages: + msg118357
2010-10-11 13:04:19pitrousetmessages: + msg118356
2010-10-11 13:01:15fdrakesetmessages: + msg118355
2010-10-11 12:57:24pitrousetmessages: + msg118354
2010-10-11 12:53:04fdrakesetmessages: + msg118353
2010-10-11 12:48:09fdrakesetnosy: + fdrake
2010-10-08 18:55:46boogenhagnsetmessages: + msg118215
2010-10-07 22:31:01pitrousetmessages: + msg118147
2010-10-07 18:26:57r.david.murraysetnosy: + r.david.murray

messages: + msg118126
versions: - Python 2.6, Python 3.1, Python 3.2, Python 3.3
2010-10-07 18:25:34r.david.murraysetnosy: + pitrou
2010-10-07 17:39:59boogenhagnsetmessages: + msg118124
2010-10-07 17:39:02boogenhagnsetcomponents: - None
2010-10-07 17:38:19boogenhagnsettype: performance
components: + None
2010-10-07 17:37:48boogenhagncreate