This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients ezio.melotti, serhiy.storchaka, vstinner
Date 2014-02-07.09:32:08
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
UTF-7 incremental decoder can crash in debug build when decodes unfinished base-64 section. In non-debug build it just produces inconsistent unicode string. Minimal examples:

$ ./python -c "import codecs; codecs.utf_7_decode(b'a+AIA', 'strict')"
python: Objects/unicodeobject.c:403: _PyUnicode_CheckConsistency: Assertion `maxchar >= 128' failed.
Aborted (core dumped)

$ ./python -c "import codecs; codecs.utf_7_decode(b'+AIA-+AQA', 'strict')"
python: Objects/unicodeobject.c:410: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x100' failed.
Aborted (core dumped)

$ ./python -c "import codecs; codecs.utf_7_decode(b'+AQA-+2ADcAA', 'strict')"
python: Objects/unicodeobject.c:414: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x10000' failed.
Aborted (core dumped)

This happens because _PyUnicodeWriter reverts position back before unfinished base-64 section, but its buffer was already widened for characters in unfinished base-64 section.

        if (inShift) {
            writer.pos = shiftOutStart; /* back off output */
            *consumed = startinpos;

And now _PyUnicodeWriter generates a string with a kind larger then needed for decoded characters.

This bug causes a lot of crashes on buildbots. E.g:
Date User Action Args
2014-02-07 09:32:08serhiy.storchakasetrecipients: + serhiy.storchaka, vstinner, ezio.melotti
2014-02-07 09:32:08serhiy.storchakasetmessageid: <>
2014-02-07 09:32:08serhiy.storchakalinkissue20538 messages
2014-02-07 09:32:08serhiy.storchakacreate