Issue1114093
Created on 2005-02-01 16:13 by manlioperillo, last changed 2005-02-01 17:23 by lemburg.
| Messages (2) | |||
|---|---|---|---|
| msg24126 - (view) | Author: Manlio Perillo (manlioperillo) | Date: 2005-02-01 16:13 | |
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on win32 >>> print sys.getdefaultencoding() ascii Regards. The problem is this code: # -*- coding: cp1252 -*- >>> u'\xe0\xe8\xec\xf2\xf9'.decode('latin1') Traceback (most recent call last): File "<pyshell#15>", line 1, in ? u'\xe0\xe8\xec\xf2\xf9'.decode('latin1') UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128) I think this is a bug. Indeed this is the behaviour of str.encode: >>> '\xe0\xe8\xec\xf2\xf9'.encode('latin1') Traceback (most recent call last): File "<pyshell#12>", line 1, in ? '\xe0\xe8\xec\xf2\xf9'.encode('latin1') UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128) But this makes no sense for Unicode strings! I think unicode.decode should be a no-op. Manlio Perillo |
|||
| msg24127 - (view) | Author: Marc-Andre Lemburg (lemburg) | Date: 2005-02-01 17:23 | |
Logged In: YES user_id=38388 What the .encode() and .decode() methods do depends on the codec being used. In your example, the Latin-1 codec is used which is a codec that encodes from Unicode to 8-bit character strings and decodes the other way around. As a result the Unicode string in your first example is first converted to an 8-bit string using the default encoding (which is ASCII) and this fails. Same in the second case: Python tries to convert the 8-bit string to Unicode but this fails since the string contains non-ASCII characters. If you switch the types of the strings in both examples, you'll have no problem at all. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2005-02-01 16:13:18 | manlioperillo | create | |