This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: built-in compile() should take encoding option.
Type: enhancement Stage: test needed
Components: Interpreter Core, Unicode Versions: Python 3.2
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, Trundle, benjamin.peterson, facundobatista, methane
Priority: normal Keywords: patch

Created on 2009-05-03 01:55 by methane, last changed 2022-04-11 14:56 by admin. This issue is now closed.

File name Uploaded Description Edit
compile_with_encoding.patch methane, 2009-10-03 16:23 add encoding option and test.
Messages (6)
msg86994 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2009-05-03 01:55
The built-in compile() expects source is encoded in utf-8.
This behavior make it harder to implement alternative shell
like IDLE and IPython. ( and are related bugs.)

Below is current compile() behavior.

# Python's interactive shell in Windows cp932 console.
>>> "あ"
>>> u"あ"

# compile() fails to decode str.
>>> code = compile('u"あ"', '__interactive__', 'single')
>>> exec code
u'\x82\xa0'  # u'\u3042' expected.

# compile() encodes unicode to utf-8.
>>> code = compile(u'"あ"', '__interactive__', 'single')
>>> exec code
'\xe3\x81\x82' # '\x82\xa0' (cp932) wanted, but I get utf-8.

Currentry, using PEP0263 like below is needed to get compile
code in expected encoding.

>>> code = compile('# coding: cp932\n%s' % ('"あ"',), '__interactive__', 
>>> exec code
>>> code = compile('# coding: cp932\n%s' % ('u"あ"',), '__interactive__', 
>>> exec code

But I feel compile() with PEP0263 is bit dirty hack.
I think adding a 'encoding' argument that have a 'utf-8' as default value to
compile() is cleaner way and it doesn't break backward compatibility.

Following example is describe behavior of compile() with encoding option.

# coding: utf-8 (in utf-8 context)
code = compile('"あ"', '', 'single')
exec code #=> '\xe3\x81\x82'

code = compile('"あ"', '', 'single', encoding='cp932') => 

code = compile(u'"あ"', '', 'single')
exec code #=> '\xe3\x81\x82'

code = compile(u'"あ"', '', 'single', encoding='cp932')
exec code #=> '\x82\xa0'
msg93501 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2009-10-03 16:23
add sample implementation.
msg93897 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-10-12 14:55
The patch as it currently stands is unacceptable because it changes
public APIs.
msg114814 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-08-24 20:12
Anyone interested in producing an updated patch?
msg114879 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2010-08-25 03:11
This problem is not heavy on Python 3.
Because Python 3's byte string can't contain non-ASCII string directory.
So passing unicode string to the compile() is good enough for all cases I can imagine.
msg114880 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-08-25 04:12
I'll close this then.
Date User Action Args
2022-04-11 14:56:48adminsetgithub: 50161
2010-08-25 04:12:08benjamin.petersonsetstatus: open -> closed
resolution: rejected
messages: + msg114880
2010-08-25 03:11:15methanesetmessages: + msg114879
2010-08-24 20:12:47BreamoreBoysetnosy: + BreamoreBoy

messages: + msg114814
versions: + Python 3.2, - Python 2.7
2009-11-24 22:45:22Trundlesetnosy: + Trundle
2009-10-12 14:55:23benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg93897
2009-10-12 13:03:16facundobatistasetnosy: + facundobatista
2009-10-03 16:23:04methanesetfiles: + compile_with_encoding.patch
keywords: + patch
messages: + msg93501
2009-05-08 19:08:41ajaksu2linkissue1542677 dependencies
2009-05-08 19:08:11ajaksu2setpriority: normal
versions: - Python 2.6
components: + Interpreter Core, Unicode, - None
stage: test needed
2009-05-03 01:55:23methanecreate