classification
Title: Crash in str.decode() with special error handler
Type: crash Stage: patch review
Components: Interpreter Core Versions: Python 3.5, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Arfrever, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2015-01-25 23:16 by serhiy.storchaka, last changed 2015-02-02 11:25 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
unicode_decode_call_errorhandler_writer.patch serhiy.storchaka, 2015-01-25 23:16 review
Messages (7)
msg234705 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-25 23:16
Debugging build crashes in some circumstances in str.decode() with error handler which produces replacement string with length larger than malformed data. For example the backslashreplace error handler produces 4-character string for every illegal byte. All other standard error handlers produce no longer than 1 character for every illegal unit.

Here is a patch which fixes this issue. I'll commit it without review because buildbots are broken without it. This issue is open for reference and post-commit review.
msg234707 - (view) Author: Roundup Robot (python-dev) Date: 2015-01-25 23:27
New changeset 2de90090e486 by Serhiy Storchaka in branch '3.4':
Issue #23321: Fixed a crash in str.decode() when error handler returned
https://hg.python.org/cpython/rev/2de90090e486

New changeset 1cd68b3c46aa by Serhiy Storchaka in branch 'default':
Issue #23321: Fixed a crash in str.decode() when error handler returned
https://hg.python.org/cpython/rev/1cd68b3c46aa
msg234725 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-01-26 08:40
> Debugging build crashes in some circumstances in str.decode() (...) buildbots are broken without it

Is it a regression? Would it be possible to identify the changeset
responsible of the regression?
msg234731 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-26 10:26
I think the changeset which made decoders to use _PyUnicodeWriter (issue16311) 
is responsible of the regression.

For example consider b'\x80abc'.decode('utf-8', 'backslashreplace').

The writer reserves string buffer with size 4 (every byte produces at most 1 
character). First byte is incorrect and replaced by 4-character string 
'\\x80'. The writer increases min_length but doesn't resize the buffer because 
its size is enough to write replacement string. But following writes of ASCII 
characters cause buffer overflow.
msg234783 - (view) Author: Roundup Robot (python-dev) Date: 2015-01-26 22:27
New changeset 1e8937861ee3 by Victor Stinner in branch 'default':
Issue #22286, #23321: Fix failing test on Windows code page 932
https://hg.python.org/cpython/rev/1e8937861ee3
msg235160 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-01 11:01
If you have no enhancements to my quick fix Victor, may be this issue can be closed.
msg235242 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-02 11:25
I closed the issue.
History
Date User Action Args
2015-02-02 11:25:23vstinnersetmessages: + msg235242
2015-02-02 11:25:17vstinnersetstatus: pending -> closed
resolution: fixed
2015-02-01 11:01:24serhiy.storchakasetstatus: open -> pending

messages: + msg235160
2015-01-26 22:27:25python-devsetmessages: + msg234783
2015-01-26 10:26:42serhiy.storchakasetmessages: + msg234731
2015-01-26 08:40:12vstinnersetmessages: + msg234725
2015-01-26 06:30:29Arfreversetnosy: + Arfrever
2015-01-25 23:27:45python-devsetnosy: + python-dev
messages: + msg234707
2015-01-25 23:16:13serhiy.storchakacreate