This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Segfault in UTF-7 incremental decoder
Type: crash Stage: resolved
Components: Interpreter Core, Unicode Versions: Python 3.3, Python 3.4
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, georg.brandl, larry, ncoghlan, python-dev, serhiy.storchaka, vstinner
Priority: release blocker Keywords: 3.4regression, buildbot, patch

Created on 2014-02-07 09:32 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin. This issue is now closed.

File name Uploaded Description Edit
issue20538-3.3.patch serhiy.storchaka, 2014-02-07 17:49 review
issue20538-3.4.patch serhiy.storchaka, 2014-02-07 17:49 review
Messages (10)
msg210444 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-07 09:32
UTF-7 incremental decoder can crash in debug build when decodes unfinished base-64 section. In non-debug build it just produces inconsistent unicode string. Minimal examples:

$ ./python -c "import codecs; codecs.utf_7_decode(b'a+AIA', 'strict')"
python: Objects/unicodeobject.c:403: _PyUnicode_CheckConsistency: Assertion `maxchar >= 128' failed.
Aborted (core dumped)

$ ./python -c "import codecs; codecs.utf_7_decode(b'+AIA-+AQA', 'strict')"
python: Objects/unicodeobject.c:410: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x100' failed.
Aborted (core dumped)

$ ./python -c "import codecs; codecs.utf_7_decode(b'+AQA-+2ADcAA', 'strict')"
python: Objects/unicodeobject.c:414: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x10000' failed.
Aborted (core dumped)

This happens because _PyUnicodeWriter reverts position back before unfinished base-64 section, but its buffer was already widened for characters in unfinished base-64 section.

        if (inShift) {
            writer.pos = shiftOutStart; /* back off output */
            *consumed = startinpos;

And now _PyUnicodeWriter generates a string with a kind larger then needed for decoded characters.

This bug causes a lot of crashes on buildbots. E.g:
msg210472 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-02-07 14:14
Note that I added a skip for test_readline in issue 20542 before realising this bug had already been filed.
msg210503 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-07 17:49
Here are patches for 3.3 and 3.4 (this is 3.3+ only bug).
msg210599 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-02-08 09:37
Patches look good to me.
msg210612 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-08 10:39
Maybe you can a new truncate operation to unicode writer? As you want.

The patch looks good to me.
msg210619 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-02-08 12:09
New changeset 8d40d9cee409 by Serhiy Storchaka in branch '3.3':
Issue #20538: UTF-7 incremental decoder produced inconsistant string when

New changeset e988661e458c by Serhiy Storchaka in branch 'default':
Issue #20538: UTF-7 incremental decoder produced inconsistant string when
msg210620 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-08 12:13
Thanks Nick and Victor for your reviews.

As far as there is only one place where truncating unicode writer is needed, I don't think this is worth special function.
msg210725 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-02-09 06:34
This checkin appears to be causing a regression in the Windows buildbots.

test_streamreaderwriter (test.test_codecs.WithStmtTest) ... test test_codecs failed

ERROR: test_readline (test.test_codecs.CP65001Test)
Traceback (most recent call last):
  File "C:\\3.x.kloth-win64\build\lib\test\", line 157, in test_readline
    self.assertEqual(readalllines("".join(vw), True), "|".join(vw))
  File "C:\\3.x.kloth-win64\build\lib\test\", line 136, in readalllines
    line = reader.readline(size=size, keepends=keepends)
  File "C:\\3.x.kloth-win64\build\lib\", line 548, in readline
    data =, firstline=True)
  File "C:\\3.x.kloth-win64\build\lib\", line 494, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'CP_UTF8' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page.

Ran 206 tests in 5.912s
msg210726 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-02-09 06:58
And to be clear: I'm currently waiting on this before tagging 3.4rc1.  If someone who understands the issue could fix this soon, I would appreciate it.
msg210731 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-02-09 09:14
Marking as closed and opening a new issue as per Serhiy's suggestion.
Date User Action Args
2022-04-11 14:57:58adminsetgithub: 64737
2014-02-09 09:14:58larrysetstatus: open -> closed
resolution: fixed
messages: + msg210731

stage: needs patch -> resolved
2014-02-09 06:58:06larrysetmessages: + msg210726
2014-02-09 06:43:58larrysetstatus: closed -> open
resolution: fixed -> (no value)
stage: resolved -> needs patch
2014-02-09 06:34:12larrysetmessages: + msg210725
2014-02-08 12:13:36serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg210620

stage: patch review -> resolved
2014-02-08 12:09:11python-devsetnosy: + python-dev
messages: + msg210619
2014-02-08 10:39:27vstinnersetmessages: + msg210612
2014-02-08 09:37:24ncoghlansetmessages: + msg210599
2014-02-07 17:50:13serhiy.storchakasetstage: needs patch -> patch review
2014-02-07 17:49:30serhiy.storchakasetfiles: + issue20538-3.3.patch, issue20538-3.4.patch
keywords: + patch
messages: + msg210503
2014-02-07 14:14:13ncoghlansetpriority: high -> release blocker

nosy: + larry, ncoghlan, georg.brandl
messages: + msg210472

keywords: + buildbot, 3.4regression
2014-02-07 14:13:22ncoghlanlinkissue20542 superseder
2014-02-07 09:34:16serhiy.storchakalinkissue20520 dependencies
2014-02-07 09:32:08serhiy.storchakacreate