Message 51669 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	larry
Recipients
Date	2007-01-12.02:42:10
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer first and the length second. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer first and the length second. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files even fussier to work on than they already are. But if you don't touch those files, you won't notice the difference, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is.

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes.  To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering.  This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay.

So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return.  Document this with strong language for external C module authors.
2. Change the length to 0 and return a constant empty string.  Suggest that users of the Unicode API ask for the pointer *first* and the length *second*.
3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end.  Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today.  "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices".  For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does.


As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained.  For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c".  I freely admit my patch makes those files *even fussier* to work on than they already are.  But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower.  At the very least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is.

History
Date	User	Action	Args
2007-08-23 15:56:03	admin	link	issue1629305 messages
2007-08-23 15:56:03	admin	create