Author larry
Recipients
Date 2007-01-12.02:42:10
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes.  To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering.  This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay.

So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return.  Document this with strong language for external C module authors.
2. Change the length to 0 and return a constant empty string.  Suggest that users of the Unicode API ask for the pointer *first* and the length *second*.
3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end.  Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today.  "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices".  For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does.


As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained.  For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c".  I freely admit my patch makes those files *even fussier* to work on than they already are.  But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower.  At the very least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is.
History
Date User Action Args
2007-08-23 15:56:03adminlinkissue1629305 messages
2007-08-23 15:56:03admincreate