Issue 999444: compiler module doesn't support unicode characters in laiter

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/40655

classification

Title:	compiler module doesn't support unicode characters in laiter
Type:		Stage:
Components:	Interpreter Core	Versions:	Python 2.4

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:	nascheme	Nosy List:	BreamoreBoy, dcjim, jhylton, lemburg, mwh, nascheme, nnorwitz
Priority:	normal	Keywords:

Created on 2004-07-28 14:00 by dcjim, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (7)
msg21835 - (view)	Author: Jim Fulton (dcjim)	Date: 2004-07-28 14:00
I'm not positive that this is a bug. The buit-in compile function acepts unicode with non-ascii text in literals: >>> text = u"print u'''\u0442\u0435\u0441\u0442'''" >>> exec compile(text, 's', 'exec') Ñ‚ÐµÑÑ‚ >>> import compiler >>> exec compiler.compile(text, 's', 'exec') Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/local/python/2.3.4/lib/python2.3/compiler/pycodegen.py", line 64, in compile gen.compile() File "/usr/local/python/2.3.4/lib/python2.3/compiler/pycodegen.py", line 111, in compile tree = self._get_tree() File "/usr/local/python/2.3.4/lib/python2.3/compiler/pycodegen.py", line 77, in _get_tree tree = parse(self.source, self.mode) File "/usr/local/python/2.3.4/lib/python2.3/compiler/transformer.py", line 50, in parse return Transformer().parsesuite(buf) File "/usr/local/python/2.3.4/lib/python2.3/compiler/transformer.py", line 120, in parsesuite return self.transform(parser.suite(text)) UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-13: ordinal not in range(128) >>>
msg21836 - (view)	Author: Jim Fulton (dcjim)	Date: 2004-07-28 14:02
Logged In: YES user_id=73023 Also in 2.3
msg21837 - (view)	Author: Michael Hudson (mwh)	Date: 2004-07-29 11:19
Logged In: YES user_id=6656 the immediate problem is that the parser module does support unicode: >>> import parser >>> parser.suite(u"print u'''\u0442\u0435\u0441\u0442'''") Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-13: ordinal not in range(128) there may well be more bugs lurking in Lib/compiler wrt this issue, but this is the first... I don't know how easy this will be to fix (looking at what the builtin compile() function does with unicode might be a good start).
msg21838 - (view)	Author: Michael Hudson (mwh)	Date: 2004-07-29 11:30
Logged In: YES user_id=6656 thinking about this a little harder, doing a proper job probably invloves mucking around in the depths of python to support source-as-unicode throughout. the vile solution is this sort of thing: >>> parser.suite('# coding: utf-8\n' + u"print u'''\u0442\u0435\u0441\u0442'''".encode('utf-8')) <parser.st object at 0x107770>
msg21839 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-07-29 11:38
Logged In: YES user_id=38388 Note that the tokenizer converts the input string into UTF-8 (transcoding it as necessary if a source code encoding shebang is found) and the compiler will assume this encoding when creating Unicode literals. I'm not sure whether the compiler package is up-to-date w/r to these internal changes in the C-based compiler.
msg21840 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2006-02-25 22:00
Logged In: YES user_id=33168 FYI
msg114368 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2010-08-19 15:38
The compiler package has been removed from py3k.

History
Date	User	Action	Args
2022-04-11 14:56:06	admin	set	github: 40655
2010-08-19 15:38:49	BreamoreBoy	set	status: open -> closed nosy: + BreamoreBoy messages: + msg114368 resolution: out of date
2009-02-07 01:00:38	nascheme	set	assignee: jhylton -> nascheme nosy: + nascheme
2004-07-28 14:00:11	dcjim	create