Message 201657 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	Arfrever, ezio.melotti, serhiy.storchaka, vstinner
Date	2013-10-29.19:19:01
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1383074341.74.0.754154238423.issue19424@psf.upfronthosting.co.za>
In-reply-to

Content
> I don't see a benefit from this patch. Oh, sorry, I forgot to explain the motivation. Performances of the warnings module are not critical module. The motivation here is to avoid to encoding string to UTF-8 for correctness. For example, _PyUnicode_AsString(filename) fails if the filename contains a surrogate character. >>> warnings.warn_explicit("text", RuntimeError, "filename", 5) filename:5: RuntimeError: text >>> warnings.warn_explicit("text", RuntimeError, "filename\udc80", 5) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 8: surrogates not allowed Another example where a string to encoded to UTF-8 and decoded from UTF-8 a few instructions later: PyObject *to_str = PyObject_Str(item); err_str = _PyUnicode_AsString(to_str); ... PyErr_Format(PyExc_RuntimeError, "...%s", err_str); Using "%R" avoids any encoding conversion.

> I don't see a benefit from this patch.

Oh, sorry, I forgot to explain the motivation. Performances of the warnings module are not critical module. The motivation here is to avoid to encoding string to UTF-8 for correctness. For example, _PyUnicode_AsString(filename) fails if the filename contains a surrogate character.

>>> warnings.warn_explicit("text", RuntimeError, "filename", 5)
filename:5: RuntimeError: text
>>> warnings.warn_explicit("text", RuntimeError, "filename\udc80", 5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 8: surrogates not allowed


Another example where a string to encoded to UTF-8 and decoded from UTF-8 a few instructions later:

PyObject *to_str = PyObject_Str(item);
err_str = _PyUnicode_AsString(to_str);
...
PyErr_Format(PyExc_RuntimeError,  "...%s", err_str);

Using "%R" avoids any encoding conversion.

History
Date	User	Action	Args
2013-10-29 19:19:01	vstinner	set	recipients: + vstinner, ezio.melotti, Arfrever, serhiy.storchaka
2013-10-29 19:19:01	vstinner	set	messageid: <1383074341.74.0.754154238423.issue19424@psf.upfronthosting.co.za>
2013-10-29 19:19:01	vstinner	link	issue19424 messages
2013-10-29 19:19:01	vstinner	create