This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author methane
Recipients methane
Date 2009-05-03.01:55:22
SpamBayes Score 1.84641e-08
Marked as misclassified No
Message-id <>
The built-in compile() expects source is encoded in utf-8.
This behavior make it harder to implement alternative shell
like IDLE and IPython. ( and are related bugs.)

Below is current compile() behavior.

# Python's interactive shell in Windows cp932 console.
>>> "あ"
>>> u"あ"

# compile() fails to decode str.
>>> code = compile('u"あ"', '__interactive__', 'single')
>>> exec code
u'\x82\xa0'  # u'\u3042' expected.

# compile() encodes unicode to utf-8.
>>> code = compile(u'"あ"', '__interactive__', 'single')
>>> exec code
'\xe3\x81\x82' # '\x82\xa0' (cp932) wanted, but I get utf-8.

Currentry, using PEP0263 like below is needed to get compile
code in expected encoding.

>>> code = compile('# coding: cp932\n%s' % ('"あ"',), '__interactive__', 
>>> exec code
>>> code = compile('# coding: cp932\n%s' % ('u"あ"',), '__interactive__', 
>>> exec code

But I feel compile() with PEP0263 is bit dirty hack.
I think adding a 'encoding' argument that have a 'utf-8' as default value to
compile() is cleaner way and it doesn't break backward compatibility.

Following example is describe behavior of compile() with encoding option.

# coding: utf-8 (in utf-8 context)
code = compile('"あ"', '', 'single')
exec code #=> '\xe3\x81\x82'

code = compile('"あ"', '', 'single', encoding='cp932') => 

code = compile(u'"あ"', '', 'single')
exec code #=> '\xe3\x81\x82'

code = compile(u'"あ"', '', 'single', encoding='cp932')
exec code #=> '\x82\xa0'
Date User Action Args
2009-05-03 01:55:25methanesetrecipients: + methane
2009-05-03 01:55:25methanesetmessageid: <>
2009-05-03 01:55:23methanelinkissue5911 messages
2009-05-03 01:55:22methanecreate