New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster unicode-escape and raw-unicode-escape codecs #60538
Comments
The proposed patch optimizes unicode-escape and raw-unicode-escape codecs. Coders still slower than in 3.2, but much faster than in 3.3. Further speedup is possible with the use of stringlib, but I think that this is enough. The code unified and simplified (251 insertions, 345 deletions). Benchmark results (on AMD Athlon 64 X2 4600+): Py2.7 Py3.2 Py3.3 Py3.4+patch 193 (+11%) 325 (-34%) 66 (+224%) 214 decode unicode-escape 'A'*10000 558 (-62%) 427 (-50%) 82 (+161%) 214 decode raw-unicode-escape 'A'*10000 182 (+137%) 215 (+101%) 148 (+192%) 432 encode unicode-escape 'A'*10000 332 (+1459%) 330 (+1468%) 333 (+1454%) 5175 encode raw-unicode-escape 'A'*10000 |
unicode-escape and raw-unicode-escape decoders now use the PyUnicodeWriter API. Can you please compare performances of your patch to PyUnicodeWriter API? Decoders overallocate the buffer. According to a comment in the decoder, overallocating is never needed (and will be slower). Your patch does not overallocate the buffer. The decoder should probably be adjusted to disable overallocation. Can you please update your patch on the encoder to the last development version? |
Victor's patch harvested most fruits, but there is a place for further optimization. Benchmark results for new patch: Py3.2 Py3.3 Py3.6 Py3.6+patch 451 (-47%) 77 (+209%) 140 (+70%) 238 decode unicode-escape 'A'*10000 559 (-62%) 88 (+143%) 194 (+10%) 214 decode raw-unicode-escape 'A'*10000 195 (+136%) 109 (+323%) 258 (+79%) 461 encode unicode-escape 'A'*10000 391 (+1310%) 333 (+1556%) 575 (+859%) 5514 encode raw-unicode-escape 'A'*10000 |
Unicode escape encodecs were modified by the issue bpo-25353 to use the _PyBytesWriter API. Sadly, I didn't benchmark my change before pushing it :-/ Your patch basically reverts my change.
I'm surprised that the revert makes raw-unicode-escape encoder so much faster. Does it mean that the _PyBytesWriter API is so inefficient? The most efficient case for _PyBytesWriter is when you only call _PyBytesWriter_Alloc() and _PyBytesWriter_Finish() and the output string has exactly the allocated length. It should be the case when 'A'*10000 is encoded, no? |
I rebased faster_unicode_escape_4.patch and made tiny changes:
|
You can benchmark it now by checking out revisions with your patch and just I used scripts from https://bitbucket.org/storchaka/cpython-stuff/src/default/
I don't remember all details, but it seems that after applying all The awesome difference in encoding for ascii-only data is not related to using
This is not correct name. This macro is used for writing non-ascii characters
Did you benchmark this change? I afraid that this inflates execution code size |
I consider that readability (maintainability) matters more than such micro optimization. |
New changeset ad5a28ace615 by Victor Stinner in branch 'default': |
Since it's almost the 3.6 beta 1, I chose to push the change right now. I'm sure that it's faster, I trust your benchmarks ;-) Thanks Serhiy for this nice enhancement.
Oh, I fixed this in the pushed change. |
Thanks Victor! I benchmarked your patch. There is no regression in comparison with my patch. In few cases your patch is even faster! Unpatched Patch v.4 Patch v.5 148 (+76%) 235 (+11%) 260 decode unicode-escape 'A'*10000 197 (+9%) 214 (+0%) 215 decode raw-unicode-escape 'A'*10000 269 (+73%) 424 (+10%) 465 encode unicode-escape 'A'*10000 578 (+853%) 5672 (-3%) 5507 encode raw-unicode-escape 'A'*10000 Could you please add NEWS and What's New entries? |
Feel free to document the change. It's not my patch, it's yours :-) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: