Message 271461 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	RalfM, ezio.melotti, vstinner
Date	2016-07-27.16:33:22
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1469637202.95.0.161137182252.issue24214@psf.upfronthosting.co.za>
In-reply-to

Content
Attached patch fixes the UTF-8 decoder to support correctly incremental decoder using surrogatepass error handler. The bug occurs when b'\xed\xa4\x80' is decoded in two parts: the first two bytes (b'\xed\xa4'), and then the last byte (b'\x80'). It works as expected if we decode the first byte (b'\xed') and then the two last bytes (b'\xa4\x80'). My patch tries to keep best performances for the UTF-8/strict decoder. @Serhiy: Would you mind to review my patch since you helped to design the fast UTF-8 decoder?

Attached patch fixes the UTF-8 decoder to support correctly incremental decoder using surrogatepass error handler.

The bug occurs when b'\xed\xa4\x80' is decoded in two parts: the first two bytes (b'\xed\xa4'), and then the last byte (b'\x80').

It works as expected if we decode the first byte (b'\xed') and then the two last bytes (b'\xa4\x80').

My patch tries to keep best performances for the UTF-8/strict decoder.

@Serhiy: Would you mind to review my patch since you helped to design the fast UTF-8 decoder?

History
Date	User	Action	Args
2016-07-27 16:33:23	vstinner	set	recipients: + vstinner, ezio.melotti, RalfM
2016-07-27 16:33:22	vstinner	set	messageid: <1469637202.95.0.161137182252.issue24214@psf.upfronthosting.co.za>
2016-07-27 16:33:22	vstinner	link	issue24214 messages
2016-07-27 16:33:22	vstinner	create