This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author larry
Recipients
Date 2006-10-02.04:04:17
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
The core concept: adding two strings together no longer returns a pure
"string" object.  Instead, it returns a "string concatenation" object
which holds references to the two strings but does not actually
concatenate them... yet.  The strings are concatenated only when someone
requests the string's value, at which point it allocates all the space
it needs and renders the concatenated string all at once.

More to the point, if you add multiple strings together (a + b + c),
it *doesn't* compute the intermediate strings (a + b).

Upsides to this approach:
        * String concatenation using + is now the fastest way to
          concatenate strings (that I know of).

        * In particular, prepending is *way* faster than it used to be.
          It used to be a pathological case, n! or something.  Now it's
          linear.

        * Throw off the shackles of "".join([]), you don't need it
          anymore.

        * Did I mention it was faster?

Downsides to this approach:

        * Changes how PyStringObjects are stored internally; ob_sval is
          no longer a char[1], but a char *.  This makes each StringObject
          four bytes larger.

        * Adds another memory dereference in order to get the value of
          a string, which is a teensy-weensy slowdown.

        * Would force a recompile of all C modules that deal directly
          with string objects (which I imagine is most of them).

        * Also, *requires* that C modules use the PyString_AS_STRING()
          macro, rather than casting the object and grabbing ob_sval
          directly.  (I was pleased to see that the Python source
          was very good about using this macro; if all Python C
          modules are this well-behaved, this point is happily moot.)

        * On a related note, the file Mac/Modules/MacOS.c implies
          that there are Mac-specific Python scripts that peer
          directly into string objects.  These would have to be
          changed to understand the new semantics.

        * String concatenation objects are 36 bytes larger than
          string objects, and this space will often go unreclaimed
          after the string is rendered.

        * When rendered, string concatenation objects storing long
          strings will allocate a second buffer from the heap to
          store the string.  So this adds some minor allocation
          overhead (though this is offset by the speed gain from
          the approach overall).

        * Will definitely need some heavy review before it could
          go in, in particular I worry I got the semantics surrounding
          "interned" strings wrong.
History
Date User Action Args
2007-08-23 15:54:47adminlinkissue1569040 messages
2007-08-23 15:54:47admincreate