classification
Title: Python code.interact() and UTF-8 locale
Type: behavior Stage:
Components: Interpreter Core Versions: Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: georg.brandl, haypo, kmtracey, pitrou
Priority: normal Keywords: patch

Created on 2005-09-12 11:40 by haypo, last changed 2008-08-10 12:24 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
code-interact.patch haypo, 2005-09-14 21:07 The code.interact() patch
Messages (9)
msg26260 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2005-09-12 11:40
Hi,

I found a bug in Python interactive command line
(program python alone: looks to be code.interact()
function in code.py). With UTF-8 locale, the command <<
u"é" >> returns << u'\xc3\xa9' >> and not << u'\xE9'
>>. Remember: the french e with acute is Unicode 233
(0xE9), encoded \xC3 \xA9 in UTF-8.

Another example of the bug:
  #-*- coding: UTF-8 -*-
  code = "u\"%s\"" % "\xc3\xa9"
  compiled = compile(code,'<string>',"single")
  exec compiled
Result :
  u'\xc3\xa9'
Excepted result :
  u'\xe9'

After long hours of debuging (read Python
documentation, debug Python with gdb, read Python C
source code, ...) I found the origin of the bug:
function parsestr() in Python/compile.c. This function
translate a string to a unicode string (or a classic
string). The problem is when the encoding declaration
doesn't exist: the string isn't converted.

Solution to the first code:
  #-*- coding: ascii -*-
  code = """#-*- coding: UTF-8 -*-
  u\"%s\"""" % "\xc3\xa9"
  compiled = compile(code,'<string>',"single")
  exec compiled

Proposition: u"..." and unicode("...") should use
sys.stdin.encoding by default. They will work as
unicode("...", sys.stdin.encoding). Or easier, the
compiler should use sys.stdin.encoding and not ascii as
default encoding.

Sorry if someone already reported this bug. And, is it
a bug or a feature ? ;-)

Bye, Haypo
msg26261 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2005-09-12 12:46
Logged In: YES 
user_id=365388

Ok ok, after long discution with RexFi on IRC, I understood
that Python can't *guess* string encoding ... I agree with
that, system locale or source encoding are not a good choice.

But ... Python console have a bug. It uses raw_input(). So I
wrote a patch to just add the right unicode cast. But Python
console don't looks to be code.interact().

I attach the patch to this comment.

Haypo
msg26262 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2005-09-14 20:03
Logged In: YES 
user_id=1188172

There's no uploaded file!  You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file.

Please try again.

(This is a SourceForge annoyance that we can do
nothing about. :-( )
msg70812 - (view) Author: Karen Tracey (kmtracey) Date: 2008-08-07 04:47
I just stumbled on this bug, it is still a problem in 2.5 and 2.6.  I
tried the supplied patch on 2.6b2 and it works.  Before the patch:

Python 2.6b2 (r26b2:65082, Jul 18 2008, 13:36:54) 
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> ustr = u'¿Cómo'      
>>> print ustr
¿Cómo
>>> import code
>>> code.interact()
Python 2.6b2 (r26b2:65082, Jul 18 2008, 13:36:54) 
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> ustr = u'¿Cómo'      
>>> print ustr
¿Cómo

After the patch:

Python 2.6b2 (r26b2:65082, Jul 18 2008, 13:36:54) 
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> ustr = u'¿Cómo'
>>> print ustr
¿Cómo
>>> import code
>>> code.interact()
Python 2.6b2 (r26b2:65082, Jul 18 2008, 13:36:54) 
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> ustr = u'¿Cómo'
>>> print ustr
¿Cómo

I realize it's a pretty little problem, but it was quite puzzling to
track down, because naturally I wasn't doing that exactly but rather
using a tool that under the covers was using code.interact() and mostly
behaves just like a bare python prompt except it was mangling unicode
string literals.  Any chance the fix could get in the code base?  The
last comment makes it sound like the patch was missing at one point. 
It's there now.  Is there any concern about it breaking something?
msg70813 - (view) Author: Karen Tracey (kmtracey) Date: 2008-08-07 05:12
FWIW I also tried the fix on a Windows box with Python 2.5.1.  The
failure there is different since the Windows command prompt apparently
uses cp437 as its encoding:

Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> ustr = u'¿Cómo'
>>> ustr
u'\xbfC\xf3mo'
>>> print ustr
¿Cómo
>>> import code
>>> code.interact()
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> ustr = u'¿Cómo'
>>> print ustr
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "d:\bin\Python2.5.1\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xa8' in
position 0: character maps to <undefined>

Applying the patch resulted in correct behavior on Windows as well.
msg70850 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-08-07 18:43
Fixed in r65578.
msg70852 - (view) Author: Karen Tracey (kmtracey) Date: 2008-08-07 18:52
Cool, thanks! Do I take it from the Versions setting that the fix will
be available in the next 2.6 beta but not get propagated to prior
releases?  (I'm not very familiar with this issue tracker so am just
trying to understand what the various fields mean.)
msg70853 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-08-07 18:58
Le jeudi 07 août 2008 à 18:52 +0000, Karen Tracey a écrit :
> Karen Tracey <kmtracey@gmail.com> added the comment:
> 
> Cool, thanks! Do I take it from the Versions setting that the fix will
> be available in the next 2.6 beta but not get propagated to prior
> releases?  (I'm not very familiar with this issue tracker so am just
> trying to understand what the various fields mean.)

Indeed, the fix will be present in the next 2.6 beta.

As for the 2.5 branch, it is in maintenance mode and we want to minimize
the amount the potential breakage that we might cause there. I don't
think the present bug is important enough to warrant a backport, but
other developers may disagree and fix 2.5 as well :-)

(as for 3.0, it is unaffected)
msg70977 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-08-10 12:24
@kmtracey: Great and thanks! Three years later, the bug is finally 
fixed :-)
History
Date User Action Args
2008-08-10 12:24:45hayposetmessages: + msg70977
2008-08-07 18:58:47pitrousetmessages: + msg70853
2008-08-07 18:52:29kmtraceysetmessages: + msg70852
2008-08-07 18:43:27pitrousetstatus: open -> closed
resolution: fixed
messages: + msg70850
2008-08-07 11:45:47pitrousetkeywords: + patch
nosy: + pitrou
type: behavior
versions: - Python 2.5, Python 2.4, Python 2.3
2008-08-07 05:12:38kmtraceysetmessages: + msg70813
2008-08-07 04:47:55kmtraceysetnosy: + kmtracey
messages: + msg70812
versions: + Python 2.6, Python 2.5, Python 2.4
2005-09-12 11:40:00haypocreate