Issue 20538: Segfault in UTF-7 incremental decoder

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/64737

classification

Title:	Segfault in UTF-7 incremental decoder
Type:	crash	Stage:	resolved
Components:	Interpreter Core, Unicode	Versions:	Python 3.3, Python 3.4

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	ezio.melotti, georg.brandl, larry, ncoghlan, python-dev, serhiy.storchaka, vstinner
Priority:	release blocker	Keywords:	3.4regression, buildbot, patch

Created on 2014-02-07 09:32 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue20538-3.3.patch	serhiy.storchaka, 2014-02-07 17:49		review
issue20538-3.4.patch	serhiy.storchaka, 2014-02-07 17:49		review

Messages (10)
msg210444 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-02-07 09:32
UTF-7 incremental decoder can crash in debug build when decodes unfinished base-64 section. In non-debug build it just produces inconsistent unicode string. Minimal examples: $ ./python -c "import codecs; codecs.utf_7_decode(b'a+AIA', 'strict')" python: Objects/unicodeobject.c:403: _PyUnicode_CheckConsistency: Assertion `maxchar >= 128' failed. Aborted (core dumped) $ ./python -c "import codecs; codecs.utf_7_decode(b'+AIA-+AQA', 'strict')" python: Objects/unicodeobject.c:410: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x100' failed. Aborted (core dumped) $ ./python -c "import codecs; codecs.utf_7_decode(b'+AQA-+2ADcAA', 'strict')" python: Objects/unicodeobject.c:414: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x10000' failed. Aborted (core dumped) This happens because _PyUnicodeWriter reverts position back before unfinished base-64 section, but its buffer was already widened for characters in unfinished base-64 section. if (inShift) { writer.pos = shiftOutStart; /* back off output / consumed = startinpos; } And now _PyUnicodeWriter generates a string with a kind larger then needed for decoded characters. This bug causes a lot of crashes on buildbots. E.g: http://buildbot.python.org/all/builders/AMD64%20Snow%20Leop%203.x/builds/1197 http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%203.3/builds/1446
msg210472 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2014-02-07 14:14
Note that I added a skip for test_readline in issue 20542 before realising this bug had already been filed.
msg210503 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-02-07 17:49
Here are patches for 3.3 and 3.4 (this is 3.3+ only bug).
msg210599 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2014-02-08 09:37
Patches look good to me.
msg210612 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-02-08 10:39
Maybe you can a new truncate operation to unicode writer? As you want. The patch looks good to me.
msg210619 - (view)	Author: Roundup Robot (python-dev)	Date: 2014-02-08 12:09
New changeset 8d40d9cee409 by Serhiy Storchaka in branch '3.3': Issue #20538: UTF-7 incremental decoder produced inconsistant string when http://hg.python.org/cpython/rev/8d40d9cee409 New changeset e988661e458c by Serhiy Storchaka in branch 'default': Issue #20538: UTF-7 incremental decoder produced inconsistant string when http://hg.python.org/cpython/rev/e988661e458c
msg210620 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-02-08 12:13
Thanks Nick and Victor for your reviews. As far as there is only one place where truncating unicode writer is needed, I don't think this is worth special function.
msg210725 - (view)	Author: Larry Hastings (larry) *	Date: 2014-02-09 06:34
This checkin appears to be causing a regression in the Windows buildbots. http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/4040 test_streamreaderwriter (test.test_codecs.WithStmtTest) ... test test_codecs failed ok ====================================================================== ERROR: test_readline (test.test_codecs.CP65001Test) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_codecs.py", line 157, in test_readline self.assertEqual(readalllines("".join(vw), True), "\|".join(vw)) File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_codecs.py", line 136, in readalllines line = reader.readline(size=size, keepends=keepends) File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\codecs.py", line 548, in readline data = self.read(readsize, firstline=True) File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\codecs.py", line 494, in read newchars, decodedbytes = self.decode(data, self.errors) UnicodeDecodeError: 'CP_UTF8' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page. ---------------------------------------------------------------------- Ran 206 tests in 5.912s
msg210726 - (view)	Author: Larry Hastings (larry) *	Date: 2014-02-09 06:58
And to be clear: I'm currently waiting on this before tagging 3.4rc1. If someone who understands the issue could fix this soon, I would appreciate it.
msg210731 - (view)	Author: Larry Hastings (larry) *	Date: 2014-02-09 09:14
Marking as closed and opening a new issue as per Serhiy's suggestion.

History
Date	User	Action	Args
2022-04-11 14:57:58	admin	set	github: 64737
2014-02-09 09:14:58	larry	set	status: open -> closed resolution: fixed messages: + msg210731 stage: needs patch -> resolved
2014-02-09 06:58:06	larry	set	messages: + msg210726
2014-02-09 06:43:58	larry	set	status: closed -> open resolution: fixed -> (no value) stage: resolved -> needs patch
2014-02-09 06:34:12	larry	set	messages: + msg210725
2014-02-08 12:13:36	serhiy.storchaka	set	status: open -> closed resolution: fixed messages: + msg210620 stage: patch review -> resolved
2014-02-08 12:09:11	python-dev	set	nosy: + python-dev messages: + msg210619
2014-02-08 10:39:27	vstinner	set	messages: + msg210612
2014-02-08 09:37:24	ncoghlan	set	messages: + msg210599
2014-02-07 17:50:13	serhiy.storchaka	set	stage: needs patch -> patch review
2014-02-07 17:49:30	serhiy.storchaka	set	files: + issue20538-3.3.patch, issue20538-3.4.patch keywords: + patch messages: + msg210503
2014-02-07 14:14:13	ncoghlan	set	priority: high -> release blocker nosy: + larry, ncoghlan, georg.brandl messages: + msg210472 keywords: + buildbot, 3.4regression
2014-02-07 14:13:22	ncoghlan	link	issue20542 superseder
2014-02-07 09:34:16	serhiy.storchaka	link	issue20520 dependencies
2014-02-07 09:32:08	serhiy.storchaka	create