Message210444
UTF-7 incremental decoder can crash in debug build when decodes unfinished base-64 section. In non-debug build it just produces inconsistent unicode string. Minimal examples:
$ ./python -c "import codecs; codecs.utf_7_decode(b'a+AIA', 'strict')"
python: Objects/unicodeobject.c:403: _PyUnicode_CheckConsistency: Assertion `maxchar >= 128' failed.
Aborted (core dumped)
$ ./python -c "import codecs; codecs.utf_7_decode(b'+AIA-+AQA', 'strict')"
python: Objects/unicodeobject.c:410: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x100' failed.
Aborted (core dumped)
$ ./python -c "import codecs; codecs.utf_7_decode(b'+AQA-+2ADcAA', 'strict')"
python: Objects/unicodeobject.c:414: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x10000' failed.
Aborted (core dumped)
This happens because _PyUnicodeWriter reverts position back before unfinished base-64 section, but its buffer was already widened for characters in unfinished base-64 section.
if (inShift) {
writer.pos = shiftOutStart; /* back off output */
*consumed = startinpos;
}
And now _PyUnicodeWriter generates a string with a kind larger then needed for decoded characters.
This bug causes a lot of crashes on buildbots. E.g:
http://buildbot.python.org/all/builders/AMD64%20Snow%20Leop%203.x/builds/1197
http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%203.3/builds/1446 |
|
Date |
User |
Action |
Args |
2014-02-07 09:32:08 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka, vstinner, ezio.melotti |
2014-02-07 09:32:08 | serhiy.storchaka | set | messageid: <1391765528.81.0.362071183373.issue20538@psf.upfronthosting.co.za> |
2014-02-07 09:32:08 | serhiy.storchaka | link | issue20538 messages |
2014-02-07 09:32:08 | serhiy.storchaka | create | |
|