Title: optimization for append-only StringIO
Type: performance Stage: resolved
Components: IO Versions: Python 3.3
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: loewis, pitrou, python-dev, terry.reedy, vstinner
Priority: normal Keywords: patch

Created on 2011-10-11 01:17 by pitrou, last changed 2011-11-10 21:52 by pitrou. This issue is now closed.

File name Uploaded Description Edit
stringio.patch pitrou, 2011-10-11 01:17 review
Messages (6)
msg145322 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-11 01:17
io.StringIO is quite slower than ''.append() when used for mass concatenation (around 5x slower). This patch brings it to similar performance by deferring construction of the internal buffer until needed.

The problem is that it's very easy to disable the optimization by calling a method other than write() and getvalue().
msg145400 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-10-12 15:57
It would be interesting to see how often the "bad" case triggers, i.e. that a write-only stringio sees any of the other methods invoked at all.
As a special case, you may consider that .truncate(0) doesn't really need to realize the buffer first.

I also wonder how much StringIO will be used in praxis, as opposed to BytesIO.
msg145404 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-12 16:06
Yes, these are things I've been wondering about. The use-case for an append-only StringIO is obviously overlapping with the use-case for using ''.join(). However, the implementation I'm proposing is better than ''.join() when writing very small strings, since there's a periodic consolidation.

> As a special case, you may consider that .truncate(0) doesn't really
> need to realize the buffer first.

True. Also, seek(0) then read() could use the same optimization.
msg145571 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-10-14 23:20
Like parts of the Python test suite, I use StringIO to capture print/write output for testing in an output...output/getvalue/reset(seek(0),truncate(0)) cycle. While this enhancement would not currently affect me (as I only do a few prints each cycle), I can easily imagine other cases where it would.
msg147411 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-11-10 20:56
New changeset 8d9a869db675 by Antoine Pitrou in branch 'default':
Issue #13149: Speed up append-only StringIO objects.
msg147412 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-10 21:52
I've committed an improved version (which also optimizes seek(0); read()).
Date User Action Args
2011-11-10 21:52:54pitrousetstatus: open -> closed
resolution: fixed
messages: + msg147412

stage: resolved
2011-11-10 20:56:53python-devsetnosy: + python-dev
messages: + msg147411
2011-10-14 23:20:28terry.reedysetnosy: + terry.reedy
messages: + msg145571
2011-10-12 16:06:52pitrousetmessages: + msg145404
2011-10-12 15:57:41loewissetnosy: + loewis
messages: + msg145400
2011-10-11 01:17:34pitroucreate