This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients Arfrever, python-dev, serhiy.storchaka, vstinner
Date 2015-01-26.10:26:41
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <129300265.9jKHJZWiYC@raxxla>
In-reply-to <CAMpsgwYgAVXE=H=yELMJAjjgnyZ5-7tXaOkcBQLoUWoyoj9_uQ@mail.gmail.com>
Content
I think the changeset which made decoders to use _PyUnicodeWriter (issue16311) 
is responsible of the regression.

For example consider b'\x80abc'.decode('utf-8', 'backslashreplace').

The writer reserves string buffer with size 4 (every byte produces at most 1 
character). First byte is incorrect and replaced by 4-character string 
'\\x80'. The writer increases min_length but doesn't resize the buffer because 
its size is enough to write replacement string. But following writes of ASCII 
characters cause buffer overflow.
History
Date User Action Args
2015-01-26 10:26:42serhiy.storchakasetrecipients: + serhiy.storchaka, vstinner, Arfrever, python-dev
2015-01-26 10:26:42serhiy.storchakalinkissue23321 messages
2015-01-26 10:26:41serhiy.storchakacreate