Title: unichr integer overflow
Components: Unicode Versions: Python 2.6, Python 2.5
Assigned To: amaury.forgeotdarc Nosy List: amaury.forgeotdarc, schmir
Created on 2008-07-31 14:53 by schmir, last changed 2022-04-11 14:56 by admin.

Messages (3)
Author: Ralf Schmitt (schmir) Date: 2008-07-31 14:53
unichr(2**32) results in a unicode string containing a 0 byte:

~/mwlib.hg/tests/ python                                            
Python 2.5.2 (r252:60911, Apr 21 2008, 11:17:30) 
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> unichr(2**32)
>>> unichr(2**32+1)
>>> unichr(2**32+2)

2.6 shows the same behaviour.
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) Date: 2008-07-31 15:20
This happens on architectures where sizeof(long) > sizeof(int):
builtin_unichr() converts its argument to a long, but calls
PyUnicode_FromOrdinal() which accepts an int.
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) Date: 2008-07-31 21:30
Committed r65339.
Will not backport to 2.5: code that used to (approximately) work would

Thanks for the report!
