This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Report surrogate characters range in utf8_encoder
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: python-dev, serhiy.storchaka, vstinner, xiang.zhang
Priority: normal Keywords: patch

Created on 2016-10-30 07:45 by xiang.zhang, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
utf8_encoder.patch xiang.zhang, 2016-10-30 07:45 review
utf8_encoder_v2.patch xiang.zhang, 2016-10-30 08:55 review
Messages (3)
msg279712 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-10-30 07:45
In utf8_encoder, when a codecs returns a string with non-ascii characters, it raises encodeerror but the start and end position are not perfect. This seems like an oversight during evolution. Before, utf8_encoder only recognize one surrogate character a time. After 2b5357b38366, it tries to recognize as much as possible a time. Patch also includes some cleanup.
msg279728 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-30 16:26
New changeset 542065b03c10 by Serhiy Storchaka in branch '3.6':
Issue #28561: Clean up UTF-8 encoder: remove dead code, update comments, etc.
https://hg.python.org/cpython/rev/542065b03c10

New changeset ee3670d9bda6 by Serhiy Storchaka in branch 'default':
Issue #28561: Clean up UTF-8 encoder: remove dead code, update comments, etc.
https://hg.python.org/cpython/rev/ee3670d9bda6
msg279729 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-30 16:28
Thanks Xiang. Yes, this all is follow up issue25267.
History
Date User Action Args
2022-04-11 14:58:38adminsetgithub: 72747
2016-10-30 16:28:28serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg279729

stage: patch review -> resolved
2016-10-30 16:26:18python-devsetnosy: + python-dev
messages: + msg279728
2016-10-30 08:55:48xiang.zhangsetfiles: + utf8_encoder_v2.patch
2016-10-30 08:43:27serhiy.storchakasetassignee: serhiy.storchaka
components: + Interpreter Core
versions: + Python 3.6, Python 3.7
2016-10-30 07:45:51xiang.zhangcreate