This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author loewis
Recipients alex.hartwig, asvetlov, ezio.melotti, loewis
Date 2012-08-29.14:19:10
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1346249952.38.0.762248489122.issue15809@psf.upfronthosting.co.za>
In-reply-to
Content
The problem is that IDLE passes an UTF-8 encoded source string to compile, and compile, in the absence of a source encoding, uses the PEP 263 default source encoding, i.e. Latin-1.

As the consequence, the variable s has the value

u'\\xd0\\xa0\\xd1\\x83\\xd1\\x81\\xd1\\x81\\xd0\\xba\\xd0\\xb8\\xd0\\xb9 \\xd1\\x82\\xd0\\xb5\\xd0\\xba\\xd1\\x81\\xd1\\x82'

IDLE's "Default Source Encoding" is irrelevant - it only applies to editor windows.

One solution for that is the attached patch. However, this patch isn't right, since it will cause all source to be interpreted as UTF-8. This would be wrong when the sys.stdin.encoding is not UTF-8, and byte string objects are created in interactive mode.

Interactive mode manages to get it right by looking up sys.stdin.encoding during compilation, but it does so only when in interactive mode (i.e. when tok->prompt != NULL.

I don't see any way to fix this problem in Python 2. It is fixed in Python 3, basically by always assuming that the source encoding is UTF-8, by making all string objects Unicode objects, and disallowing non-ASCII characters in bytes literals
History
Date User Action Args
2012-08-29 14:19:12loewissetrecipients: + loewis, ezio.melotti, asvetlov, alex.hartwig
2012-08-29 14:19:12loewissetmessageid: <1346249952.38.0.762248489122.issue15809@psf.upfronthosting.co.za>
2012-08-29 14:19:11loewislinkissue15809 messages
2012-08-29 14:19:11loewiscreate