Message96732
All string length calculations in Python 2.4 are done using ints
which are 32-bit, even on 64-bit platforms.
Since UTF-8 can use up to 4 bytes per Unicode code point, the encoder
overallocates the needed chunk of memory to len*4 bytes. This
will go straight over the 2GB limit the 32-bit int imposes if
you try to encode a 512M code point Unicode string.
The reason for using ints to represent string length is simple:
no one really expected that someone would work with 2GB strings
in memory at the time the string API was designed (large hard
drives had around 2GB at that time) - strings of such size are
simply not supported by Python 2.4.
BTW: I wouldn't really count on Python 2.4 working properly on
64-bit platforms. A lot of issues were fixed in Python 2.5
related to 32/64-bit differences. |
|
Date |
User |
Action |
Args |
2009-12-21 09:24:34 | lemburg | set | recipients:
+ lemburg, loewis, ajung, mark.dickinson |
2009-12-21 09:24:32 | lemburg | link | issue7551 messages |
2009-12-21 09:24:31 | lemburg | create | |
|