Message149612
Actually, this fails on 2.6 and 2.7 on wide unicode builds, and passes with narrow unicode builds (on my 64bit Linux box).
In pyexpat.c, PyUnknownEncodingHandler accesses 256 characters of a unicode buffer, without checking its length... which happens to be 192 chars long.
So buffers overflow, etc. The function has a comment "supports only 8bit encodings"; indeed.
Versions 3.2 and 3.3 happen to pass the test, probably by pure luck.
Supporting multibytes codecs won't be easy: pyexpat requires to fill an array which specifies the number of bytes needed by each start byte (for example, in utf-8, 0xc3 starts a 2-bytes sequence, 0xef starts a 3-bytes sequence). Our codecs framwork does not provide this information, and some codecs (gb18030 for example) need the second char to determine whether it will need 4 bytes. |
|
Date |
User |
Action |
Args |
2011-12-16 11:26:45 | amaury.forgeotdarc | set | recipients:
+ amaury.forgeotdarc, dongying |
2011-12-16 11:26:45 | amaury.forgeotdarc | set | messageid: <1324034805.33.0.644386454984.issue13612@psf.upfronthosting.co.za> |
2011-12-16 11:26:44 | amaury.forgeotdarc | link | issue13612 messages |
2011-12-16 11:26:44 | amaury.forgeotdarc | create | |
|