Author lemburg
Recipients Arfrever, amaury.forgeotdarc, lemburg, loewis, vstinner
Date 2009-08-19.12:49:56
SpamBayes Score 9.27713e-12
Marked as misclassified No
Message-id <4A8BF4F2.8020603@egenix.com>
In-reply-to <1250685240.55.0.880064208852.issue6697@psf.upfronthosting.co.za>
Content
Amaury Forgeot d'Arc wrote:
> 
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:
> 
> The problem is actually wider::
>     >>> getattr(None, "\udc80")
>     Segmentation fault
> An idea would be to change _PyUnicode_AsDefaultEncodedString and allow
> unpaired surrogates (utf8+surrogateescape, as explained in PEP383), but
> I fear the consequences...
>
> The code that fails seems pretty common:
> 	PyErr_Format(PyExc_AttributeError,
> 		     "'%.50s' object has no attribute '%.400s'",
> 		     tp->tp_name, _PyUnicode_AsString(name));
> It would be unfortunate to replace all usages of _PyUnicode_AsString to
> check the return value.

The use of _PyUnicode_AsString() is wrong here. There are several
cases where it can fail, e.g. MemoryErrors, embedded NULs, encoding
errors.

The same is true for _PyUnicode_AsStringAndSize(), which is why
I turned them into Python interpreter private APIs before 3.0
shipped.

If you want a fail-safe stringified version of a Unicode object,
your only choice is to create a new API that does error checking,
properly clears the error and then returns a reference to a constant
string, e.g. "<repr-error>".
History
Date User Action Args
2009-08-19 12:50:08lemburgsetrecipients: + lemburg, loewis, amaury.forgeotdarc, vstinner, Arfrever
2009-08-19 12:49:56lemburglinkissue6697 messages
2009-08-19 12:49:56lemburgcreate