Message 210444 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	ezio.melotti, serhiy.storchaka, vstinner
Date	2014-02-07.09:32:08
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1391765528.81.0.362071183373.issue20538@psf.upfronthosting.co.za>
In-reply-to

Content
UTF-7 incremental decoder can crash in debug build when decodes unfinished base-64 section. In non-debug build it just produces inconsistent unicode string. Minimal examples: $ ./python -c "import codecs; codecs.utf_7_decode(b'a+AIA', 'strict')" python: Objects/unicodeobject.c:403: _PyUnicode_CheckConsistency: Assertion `maxchar >= 128' failed. Aborted (core dumped) $ ./python -c "import codecs; codecs.utf_7_decode(b'+AIA-+AQA', 'strict')" python: Objects/unicodeobject.c:410: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x100' failed. Aborted (core dumped) $ ./python -c "import codecs; codecs.utf_7_decode(b'+AQA-+2ADcAA', 'strict')" python: Objects/unicodeobject.c:414: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x10000' failed. Aborted (core dumped) This happens because _PyUnicodeWriter reverts position back before unfinished base-64 section, but its buffer was already widened for characters in unfinished base-64 section. if (inShift) { writer.pos = shiftOutStart; /* back off output / consumed = startinpos; } And now _PyUnicodeWriter generates a string with a kind larger then needed for decoded characters. This bug causes a lot of crashes on buildbots. E.g: http://buildbot.python.org/all/builders/AMD64%20Snow%20Leop%203.x/builds/1197 http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%203.3/builds/1446

UTF-7 incremental decoder can crash in debug build when decodes unfinished base-64 section. In non-debug build it just produces inconsistent unicode string. Minimal examples:

$ ./python -c "import codecs; codecs.utf_7_decode(b'a+AIA', 'strict')"
python: Objects/unicodeobject.c:403: _PyUnicode_CheckConsistency: Assertion `maxchar >= 128' failed.
Aborted (core dumped)

$ ./python -c "import codecs; codecs.utf_7_decode(b'+AIA-+AQA', 'strict')"
python: Objects/unicodeobject.c:410: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x100' failed.
Aborted (core dumped)

$ ./python -c "import codecs; codecs.utf_7_decode(b'+AQA-+2ADcAA', 'strict')"
python: Objects/unicodeobject.c:414: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x10000' failed.
Aborted (core dumped)

This happens because _PyUnicodeWriter reverts position back before unfinished base-64 section, but its buffer was already widened for characters in unfinished base-64 section.

        if (inShift) {
            writer.pos = shiftOutStart; /* back off output */
            *consumed = startinpos;
        }

And now _PyUnicodeWriter generates a string with a kind larger then needed for decoded characters.

This bug causes a lot of crashes on buildbots. E.g:
http://buildbot.python.org/all/builders/AMD64%20Snow%20Leop%203.x/builds/1197
http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%203.3/builds/1446

History
Date	User	Action	Args
2014-02-07 09:32:08	serhiy.storchaka	set	recipients: + serhiy.storchaka, vstinner, ezio.melotti
2014-02-07 09:32:08	serhiy.storchaka	set	messageid: <1391765528.81.0.362071183373.issue20538@psf.upfronthosting.co.za>
2014-02-07 09:32:08	serhiy.storchaka	link	issue20538 messages
2014-02-07 09:32:08	serhiy.storchaka	create