Message169385
The problem is that IDLE passes an UTF-8 encoded source string to compile, and compile, in the absence of a source encoding, uses the PEP 263 default source encoding, i.e. Latin-1.
As the consequence, the variable s has the value
u'\\xd0\\xa0\\xd1\\x83\\xd1\\x81\\xd1\\x81\\xd0\\xba\\xd0\\xb8\\xd0\\xb9 \\xd1\\x82\\xd0\\xb5\\xd0\\xba\\xd1\\x81\\xd1\\x82'
IDLE's "Default Source Encoding" is irrelevant - it only applies to editor windows.
One solution for that is the attached patch. However, this patch isn't right, since it will cause all source to be interpreted as UTF-8. This would be wrong when the sys.stdin.encoding is not UTF-8, and byte string objects are created in interactive mode.
Interactive mode manages to get it right by looking up sys.stdin.encoding during compilation, but it does so only when in interactive mode (i.e. when tok->prompt != NULL.
I don't see any way to fix this problem in Python 2. It is fixed in Python 3, basically by always assuming that the source encoding is UTF-8, by making all string objects Unicode objects, and disallowing non-ASCII characters in bytes literals |
|
Date |
User |
Action |
Args |
2012-08-29 14:19:12 | loewis | set | recipients:
+ loewis, ezio.melotti, asvetlov, alex.hartwig |
2012-08-29 14:19:12 | loewis | set | messageid: <1346249952.38.0.762248489122.issue15809@psf.upfronthosting.co.za> |
2012-08-29 14:19:11 | loewis | link | issue15809 messages |
2012-08-29 14:19:11 | loewis | create | |
|