classification
Title: int(u"\u1234") raises UnicodeEncodeError
Type: Stage:
Components: Unicode Versions: Python 2.3
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: loewis Nosy List: doerwalter, gvanrossum, loewis
Priority: normal Keywords:

Created on 2002-12-11 16:07 by gvanrossum, last changed 2002-12-12 17:35 by doerwalter. This issue is now closed.

Messages (4)
msg13594 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-12-11 16:07
In python 2.2, int of a unicode string containing
non-digit characters raises ValueError, like all other
attempts to convert an invalid string or unicode to
int. But in Python 2.3, it appears that int() of a
unicode string si implemented differently and now can
raise UnicodeEncodeError:

>>> int(u"\u1234")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'decimal' codec can't encode
character '\u1234' in position 0: invalid decimal
Unicode string
>>> 

I think it's important that int() of a string or
unicode argument only raises ValueError to indicate
invalid inputs -- otherwise one ends up writing bare
excepts for conversions to string (as it is too much
trouble to keep track of which Python versions can
raise which exceptions).
msg13595 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-12-11 16:39
Logged In: YES 
user_id=21627

I don't see the problem:

>>> try:
...   int(u"\u1234")
... except ValueError:
...   print "caught"
...
caught
>>> issubclass(UnicodeEncodeError,ValueError)
True
msg13596 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-12-11 16:45
Logged In: YES 
user_id=6380

Ah, thanks. Sorry.
msg13597 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2002-12-12 17:35
Logged In: YES 
user_id=89016

PyUnicode_EncodeDecimal() is responsible for this change.
This function was changed due to the PEP 293 implementation.
In Python 2.2 it raised a ValueError, which IMHO is a bug,
because as an encoding function that encodes unicode to str,
it should raise a UnicodeError in case of an unencodable
character.
History
Date User Action Args
2002-12-11 16:07:21gvanrossumcreate