classification
Title: Segfault in UTF-7 incremental decoder
Type: crash Stage: resolved
Components: Interpreter Core, Unicode Versions: Python 3.4, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, georg.brandl, haypo, larry, ncoghlan, python-dev, serhiy.storchaka
Priority: release blocker Keywords: 3.4regression, buildbot, patch

Created on 2014-02-07 09:32 by serhiy.storchaka, last changed 2014-02-09 09:14 by larry. This issue is now closed.

Files
File name Uploaded Description Edit
issue20538-3.3.patch serhiy.storchaka, 2014-02-07 17:49 review
issue20538-3.4.patch serhiy.storchaka, 2014-02-07 17:49 review
Messages (10)
msg210444 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-07 09:32
UTF-7 incremental decoder can crash in debug build when decodes unfinished base-64 section. In non-debug build it just produces inconsistent unicode string. Minimal examples:

$ ./python -c "import codecs; codecs.utf_7_decode(b'a+AIA', 'strict')"
python: Objects/unicodeobject.c:403: _PyUnicode_CheckConsistency: Assertion `maxchar >= 128' failed.
Aborted (core dumped)

$ ./python -c "import codecs; codecs.utf_7_decode(b'+AIA-+AQA', 'strict')"
python: Objects/unicodeobject.c:410: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x100' failed.
Aborted (core dumped)

$ ./python -c "import codecs; codecs.utf_7_decode(b'+AQA-+2ADcAA', 'strict')"
python: Objects/unicodeobject.c:414: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x10000' failed.
Aborted (core dumped)

This happens because _PyUnicodeWriter reverts position back before unfinished base-64 section, but its buffer was already widened for characters in unfinished base-64 section.

        if (inShift) {
            writer.pos = shiftOutStart; /* back off output */
            *consumed = startinpos;
        }

And now _PyUnicodeWriter generates a string with a kind larger then needed for decoded characters.

This bug causes a lot of crashes on buildbots. E.g:
http://buildbot.python.org/all/builders/AMD64%20Snow%20Leop%203.x/builds/1197
http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%203.3/builds/1446
msg210472 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-02-07 14:14
Note that I added a skip for test_readline in issue 20542 before realising this bug had already been filed.
msg210503 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-07 17:49
Here are patches for 3.3 and 3.4 (this is 3.3+ only bug).
msg210599 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-02-08 09:37
Patches look good to me.
msg210612 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-02-08 10:39
Maybe you can a new truncate operation to unicode writer? As you want.

The patch looks good to me.
msg210619 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-08 12:09
New changeset 8d40d9cee409 by Serhiy Storchaka in branch '3.3':
Issue #20538: UTF-7 incremental decoder produced inconsistant string when
http://hg.python.org/cpython/rev/8d40d9cee409

New changeset e988661e458c by Serhiy Storchaka in branch 'default':
Issue #20538: UTF-7 incremental decoder produced inconsistant string when
http://hg.python.org/cpython/rev/e988661e458c
msg210620 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-08 12:13
Thanks Nick and Victor for your reviews.

As far as there is only one place where truncating unicode writer is needed, I don't think this is worth special function.
msg210725 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-02-09 06:34
This checkin appears to be causing a regression in the Windows buildbots.


http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/4040


test_streamreaderwriter (test.test_codecs.WithStmtTest) ... test test_codecs failed
ok

======================================================================
ERROR: test_readline (test.test_codecs.CP65001Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_codecs.py", line 157, in test_readline
    self.assertEqual(readalllines("".join(vw), True), "|".join(vw))
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_codecs.py", line 136, in readalllines
    line = reader.readline(size=size, keepends=keepends)
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\codecs.py", line 548, in readline
    data = self.read(readsize, firstline=True)
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\codecs.py", line 494, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'CP_UTF8' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page.

----------------------------------------------------------------------
Ran 206 tests in 5.912s
msg210726 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-02-09 06:58
And to be clear: I'm currently waiting on this before tagging 3.4rc1.  If someone who understands the issue could fix this soon, I would appreciate it.
msg210731 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2014-02-09 09:14
Marking as closed and opening a new issue as per Serhiy's suggestion.
History
Date User Action Args
2014-02-09 09:14:58larrysetstatus: open -> closed
resolution: fixed
messages: + msg210731

stage: needs patch -> resolved
2014-02-09 06:58:06larrysetmessages: + msg210726
2014-02-09 06:43:58larrysetstatus: closed -> open
resolution: fixed -> (no value)
stage: resolved -> needs patch
2014-02-09 06:34:12larrysetmessages: + msg210725
2014-02-08 12:13:36serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg210620

stage: patch review -> resolved
2014-02-08 12:09:11python-devsetnosy: + python-dev
messages: + msg210619
2014-02-08 10:39:27hayposetmessages: + msg210612
2014-02-08 09:37:24ncoghlansetmessages: + msg210599
2014-02-07 17:50:13serhiy.storchakasetstage: needs patch -> patch review
2014-02-07 17:49:30serhiy.storchakasetfiles: + issue20538-3.3.patch, issue20538-3.4.patch
keywords: + patch
messages: + msg210503
2014-02-07 14:14:13ncoghlansetpriority: high -> release blocker

nosy: + larry, ncoghlan, georg.brandl
messages: + msg210472

keywords: + buildbot, 3.4regression
2014-02-07 14:13:22ncoghlanlinkissue20542 superseder
2014-02-07 09:34:16serhiy.storchakalinkissue20520 dependencies
2014-02-07 09:32:08serhiy.storchakacreate