Message 234731 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	Arfrever, python-dev, serhiy.storchaka, vstinner
Date	2015-01-26.10:26:41
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<129300265.9jKHJZWiYC@raxxla>
In-reply-to	<CAMpsgwYgAVXE=H=yELMJAjjgnyZ5-7tXaOkcBQLoUWoyoj9_uQ@mail.gmail.com>

Content
I think the changeset which made decoders to use _PyUnicodeWriter (issue16311) is responsible of the regression. For example consider b'\x80abc'.decode('utf-8', 'backslashreplace'). The writer reserves string buffer with size 4 (every byte produces at most 1 character). First byte is incorrect and replaced by 4-character string '\\x80'. The writer increases min_length but doesn't resize the buffer because its size is enough to write replacement string. But following writes of ASCII characters cause buffer overflow.

I think the changeset which made decoders to use _PyUnicodeWriter (issue16311) 
is responsible of the regression.

For example consider b'\x80abc'.decode('utf-8', 'backslashreplace').

The writer reserves string buffer with size 4 (every byte produces at most 1 
character). First byte is incorrect and replaced by 4-character string 
'\\x80'. The writer increases min_length but doesn't resize the buffer because 
its size is enough to write replacement string. But following writes of ASCII 
characters cause buffer overflow.

History
Date	User	Action	Args
2015-01-26 10:26:42	serhiy.storchaka	set	recipients: + serhiy.storchaka, vstinner, Arfrever, python-dev
2015-01-26 10:26:42	serhiy.storchaka	link	issue23321 messages
2015-01-26 10:26:41	serhiy.storchaka	create