New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize UTF-8 decoder with error handlers #69488
Comments
The issue bpo-24870 optimized the ASCII decoder with error handlers: New changeset 3c430259873e by Victor Stinner in branch 'default': We should also optimize the UTF-8 decoder with error handlers. I will work on a patch next days. |
Here is a first patch. It is written to keep best performances for valid UTF-8 encoded string, but speedup strings with a few undecodable bytes. |
Results of the microbenchmark on the UTF-8 decoder. As expected, performances on valid UTF-8 is unchanged, which was an important goal for me. Decoding with error handlers optimized by the patch are *much* faster. backslashreplace is still slow, because I didn't optimize it. Common platform: Platform of campaign before: Platform of campaign after: ---------------------+-------------+-------- ------------------+-------------+--------------- ------------------+-------------+--------------- ------------------+-------------+--------------- ------------------+-------------+-------- ---------------------+-------------+--------------- |
New changeset 3152e4038d97 by Victor Stinner in branch 'default': |
I pushed my optimization. I close the issue. |
New changeset 5b9ffea7e7c3 by Victor Stinner in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: