classification
Title: efficient string concatenation
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: arigo, gvanrossum, mcherm, pedronis, rhettinger
Priority: normal Keywords: patch

Created on 2004-06-27 13:39 by arigo, last changed 2004-08-06 18:47 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
string_concat.context-diff arigo, 2004-06-27 20:39 patch
string_concat_deref.diff arigo, 2004-06-28 09:41 added STORE_DEREF support
string_concat_simple.diff arigo, 2004-07-25 10:17 simplified and commented
ceval.diff rhettinger, 2004-08-01 16:34 Extended to include BINARY_ADD
ceval3.diff rhettinger, 2004-08-01 18:42 Do BINARY_ADD and INPLACE_ADD separately
Messages (17)
msg46249 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-06-27 13:39
A wild idea that makes repeated string concatenations efficient without changing stringobject.c.

If we assume we don't want to change the representation of strings, then the problem is that string_concat(v,w) doesn't know if v will soon released, so it cannot resize it in-place even if the refcnt is 1.

But with some hacks ceval.c can know that.  Statements like s=s+expr or s+=expr both compile to a BINARY_ADD or INPLACE_ADD followed by a STORE_FAST or STORE_NAME.  So in the attached patch ceval.c special-cases addition of two strings (in the same way as it special-cases addition of two integers already).  If moreover the addition is followed by a STORE that is about to overwrite the addition's left argument, and if the refcnt is right, then the left argument can be resized in-place (plus some obscure magic to ensure that everything is still valid even if resize moves the string in memory).

With Python's good memory manager, repeated resizes even without manual over-allocation perform nicely.

As a side effect, other constructions like a+b+c+d+e+f also work in-place now.

The patch would do with a lot more comments.
msg46250 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-06-27 20:39
Logged In: YES 
user_id=4771

resubmitted as a context diff.
msg46251 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-06-28 09:41
Logged In: YES 
user_id=4771

another patch with support for STORE_DEREF (thanks Phillip for pointing it out)
msg46252 - (view) Author: Michael Chermside (mcherm) Date: 2004-06-28 11:38
Logged In: YES 
user_id=99874

Hmmm... Interesting. I kinda like it.
msg46253 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-07-25 10:17
Logged In: YES 
user_id=4771

Here is another patch.  This one focuses on simplicity, both
implementation-wise and from the user's point of view:

1) It only makes repeated  variable += expr  faster. It
doesn't touch the '+'.
2) It doesn't mess with the internals of strings and dicts
any more.  It is just one well-documented function now.

The goal of this new patch is to be reviewable and
maintainable, to get it in the core to stop people from
being bitten by the performance of += (I just saw yet
another example yesterday).
msg46254 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-07-25 10:18
Logged In: YES 
user_id=4771

Raymond, do you have time to review it?
msg46255 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-08-01 15:29
Logged In: YES 
user_id=80475

I've run with this for a couple of days and have not been
able to break it.   My benchmarks run slightly faster except
for PyBench which is 0.1% slower.  Regression tests pass
without exception.

The code itself looks fine and the approach makes sense.  It
does introduce some extra complexity and code duplication,
but the patch is still unified and comprehensible.
msg46256 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-08-01 16:34
Logged In: YES 
user_id=80475

FWIW, I think the patch ought to also apply to BINARY_ADD. 
There is tons of code that writes a=a+b rather than a+=b.  
It's important that there not be a substantial performance
difference between the two; otherwise, we're introducing
another bit of arcana.

Attached is a revised patch that combines BINARY_ADD with
INPLACE_ADD for a net savings of instructions in ceval.c. 
The timings improve on all my benchmarks (PyBench for
example runs 4% faster).
msg46257 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-08-01 18:42
Logged In: YES 
user_id=80475

Further timings show it is advantageous to keep BINARY_ADD
and INPLACE_ADD separate.  Revised patch attached.  Now all
benchmark tests show improvements.
msg46258 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-08-02 11:10
Logged In: YES 
user_id=4771

Optimizing BINARY_ADD was what I did in the first patch too, but it has the drawback of being harder to explain.  The main trouble I see is that a=a+b+c will NOT be accelerated, while a=a+b or a+=b+c will.

The rationale for only tampering with INPLACE_ADD was to make the result more predictable.  Moreover the behavior would be superficially similar to what occurs with lists with a=a+b vs a+=b.

In-place operators in Python are a dark corner of the language.  I think it is OK for people to think about it as a possibly optimized (in-place-modifying) version of the standard operators.  Thus the idea of only tampering with INPLACE_ADD for the optimization here.

Well, maybe you can find a convincingly clear explanation of why a=a+b works but a=a+b+c doesn't.
msg46259 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-08-02 15:09
Logged In: YES 
user_id=80475

I recommend doing both.  When and where the optimization
applies is an implementation detail (that probably won't be
replicated in Jython for example).

To do only inplace add is to miss much of the existing code
that is in need of help.  Several library modules and
benchmarks benefit greatly from doing both.  This makes
common coding styles not result in disasterous performance.  

The explanation can read something like:
"With binary additions of the form a+=b or a=a+b, CPython is
able to concatenate inplace and achieve linear running time.
 However, with chained additions like a = a + b + c, the
current implementation does not recognize the opportunity to
concatenate inplace.  Instead, a new string is constructed
at each step resulting in quadratic running time."

msg46260 - (view) Author: Samuele Pedroni (pedronis) * (Python committer) Date: 2004-08-03 11:19
Logged In: YES 
user_id=61408

well, given the refcnt dependence and other issues, this cannot
be reproduced in Jython (and similar impl. of Python).

the issue is then that the std library itself cannot depend
on the optimisation because the std library is shared among
impls.
msg46261 - (view) Author: Michael Chermside (mcherm) Date: 2004-08-03 12:37
Logged In: YES 
user_id=99874

While I think CPython should get the patch, I *am* worried
about Jython (and Iron Python I suppose). In particular,
Samuele's plea to keep this out of the std library is likely
to be followed for a few months, then completely forgotten.
Perhaps we need some way to check, when running the test
suite, just how much dependence there is on this feature...
then there's at least be a way to *find* the bits where
people who didn't predate this change failed to use the
obscure ''.join() idiom that us oldtimers keep nagging about
<wink>.
msg46262 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-08-03 14:13
Logged In: YES 
user_id=6380

Not so fast, see python-dev.
msg46263 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-08-05 15:36
Logged In: YES 
user_id=6380

This can go in on two conditions:

1) That is clearly documented as a quality-of-implementation
issue, not a language feature

2) That it is not relied upon in the standard library (which
is shared with Jython)
msg46264 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-08-05 23:00
Logged In: YES 
user_id=80475

I;ve got it from here.  Will document the coding restriction
in the style guilde and will put the quality of
implementation notes directly in the string docs.
msg46265 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-08-06 18:47
Logged In: YES 
user_id=80475

Applied as:

Python/ceval.c 2.413
Doc/lib/libstdtypes.tex 1.160
nondist/peps/pep-0008.txt 1.25
History
Date User Action Args
2004-06-27 13:39:19arigocreate