Message51668
Larry, I probably wasn't clear enough:
PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API.
Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons).
In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory).
A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string.
I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design.
|
|
Date |
User |
Action |
Args |
2007-08-23 15:56:02 | admin | link | issue1629305 messages |
2007-08-23 15:56:02 | admin | create | |
|