Message 115889 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	amaury.forgeotdarc, vstinner
Date	2010-09-08.18:17:14
SpamBayes Score	3.5860204e-14
Marked as misclassified	No
Message-id	<1283969838.58.0.438644096176.issue9769@psf.upfronthosting.co.za>
In-reply-to

Content
> My remark is that utf-8 tend to be applied to all kind of files; > if someone once decide that non-ascii chars are allowed in (some) > string constants, they will be stored in utf-8. In this case, it will be better to raise an error on non-ascii byte (character) in the format string. It's better to raise an error than to interpret utf-8 as iso-8859-1 (mojibake!). Since nobody noticed this bug (PyFormat_FromString/PyErr_Format expects ISO-8859-1), I suppose that nobody uses non-ASCII format string is always ascii. Python builtin errors are not localised. If an application uses gettext, I suppose that the error will be raised in the Python code, not in the C API. Attached patch changes PyFormat_FromStringV (and so PyFormat_FromString and PyErr_Format) to reject non-ascii byte (character) in the format string. I added a test and documented the format string encoding (which is now ASCII). See also #9738 for the documentation about function argument encoding.

> My remark is that utf-8 tend to be applied to all kind of files;
> if someone once decide that non-ascii chars are allowed in (some) 
> string constants, they will be stored in utf-8.

In this case, it will be better to raise an error on non-ascii byte (character) in the format string. It's better to raise an error than to interpret utf-8 as iso-8859-1 (mojibake!). Since nobody noticed this bug (PyFormat_FromString/PyErr_Format expects ISO-8859-1), I suppose that nobody uses non-ASCII format string is always ascii.

Python builtin errors are not localised. If an application uses gettext, I suppose that the error will be raised in the Python code, not in the C API.

Attached patch changes PyFormat_FromStringV (and so PyFormat_FromString and PyErr_Format) to reject non-ascii byte (character) in the format string. I added a test and documented the format string encoding (which is now ASCII). See also #9738 for the documentation about function argument encoding.

History
Date	User	Action	Args
2010-09-08 18:17:18	vstinner	set	recipients: + vstinner, amaury.forgeotdarc
2010-09-08 18:17:18	vstinner	set	messageid: <1283969838.58.0.438644096176.issue9769@psf.upfronthosting.co.za>
2010-09-08 18:17:16	vstinner	link	issue9769 messages
2010-09-08 18:17:16	vstinner	create