This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author xiang.zhang
Recipients serhiy.storchaka, vstinner, xiang.zhang
Date 2016-10-30.07:45:50
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1477813551.36.0.0620162768049.issue28561@psf.upfronthosting.co.za>
In-reply-to
Content
In utf8_encoder, when a codecs returns a string with non-ascii characters, it raises encodeerror but the start and end position are not perfect. This seems like an oversight during evolution. Before, utf8_encoder only recognize one surrogate character a time. After 2b5357b38366, it tries to recognize as much as possible a time. Patch also includes some cleanup.
History
Date User Action Args
2016-10-30 07:45:51xiang.zhangsetrecipients: + xiang.zhang, vstinner, serhiy.storchaka
2016-10-30 07:45:51xiang.zhangsetmessageid: <1477813551.36.0.0620162768049.issue28561@psf.upfronthosting.co.za>
2016-10-30 07:45:51xiang.zhanglinkissue28561 messages
2016-10-30 07:45:50xiang.zhangcreate