classification
Title: unichr integer overflow
Type: behavior Stage:
Components: Unicode Versions: Python 2.6, Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: amaury.forgeotdarc Nosy List: amaury.forgeotdarc, schmir
Priority: normal Keywords:

Created on 2008-07-31 14:53 by schmir, last changed 2008-07-31 21:30 by amaury.forgeotdarc. This issue is now closed.

Messages (3)
msg70513 - (view) Author: Ralf Schmitt (schmir) Date: 2008-07-31 14:53
unichr(2**32) results in a unicode string containing a 0 byte:

{{{
~/mwlib.hg/tests/ python                                            
Python 2.5.2 (r252:60911, Apr 21 2008, 11:17:30) 
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> unichr(2**32)
u'\x00'
>>> unichr(2**32+1)
u'\x01'
>>> unichr(2**32+2)
u'\x02'
}}}

2.6 shows the same behaviour.
msg70519 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-07-31 15:20
Confirmed. 
This happens on architectures where sizeof(long) > sizeof(int):
builtin_unichr() converts its argument to a long, but calls
PyUnicode_FromOrdinal() which accepts an int.
msg70531 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-07-31 21:30
Committed r65339.
Will not backport to 2.5: code that used to (approximately) work would
break.

Thanks for the report!
History
Date User Action Args
2008-07-31 21:30:37amaury.forgeotdarcsetstatus: open -> closed
resolution: fixed
messages: + msg70531
2008-07-31 15:20:51amaury.forgeotdarcsetassignee: amaury.forgeotdarc
messages: + msg70519
nosy: + amaury.forgeotdarc
2008-07-31 14:53:44schmircreate