New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add _PyBytesWriter API #61942
Comments
In Python 3.3, I added _PyUnicodeWriter API to factorize code handling a Unicode "buffer", just the code to allocate memory and resize the buffer if needed. I propose to do the same with a new _PyBytesWriter API. The API is very similar to _PyUnicodeWriter:
The patch changes ASCII, Latin1, UTF-8 and charmap encoders to use _PyBytesWriter API. A second patch changes CJK encoders. I did not run a benchmark yet. I wrote a patch to factorize the code, not the make the code faster. Notes on performances:
|
See also bpo-17694. |
Could you please provide the benchmarks results? I am afraid that it may hit a performance. Results on Windows are especially interesting. |
Your patches don't seem to reduce the line count, so I don't understand the point. |
Here is a benchmark. It looks like the overallocation is not configured correctly for charmap and UTF-8 encoders: it is always enabled for UTF-8 encoder (whereas it should only be enabled on the first call to an error handler), and never enabled for charmap encoder. |
I run my own benchmarks and don't see any regression besides a random noise. Actually I see even speed up for some cases: ./python -m timeit -s "a = '\x80'*10000" "a.encode()" Before patch: 29.8 usec per loop, after patch: 21.5 usec per loop. |
I have added some comments on Rietveld. |
Benchmark script, should be used with: |
New version of the patch:
Performances are almost the same than without the patch. It looks like they are a little bit better. |
_PyBytesWriter and _PyUnicodeWriter have differen use cases. While _PyUnicodeWriter used primary in formatter where resulting size is rarely known and reallocation in decoders usually caused by widening result, _PyBytesWriter is used only in decoders where we usually can estimate a result size or it's upper bound. Resizing happened only in exceptional cases, when error handler called. The patch contains a special case for writing only one bytes object. This is very unlikely case. It happened only when an encoded string contains only one illegal character. I think this case is not worth a complication and |
The patch only modify a few functions to make them use the new _PyBytesWriter API. Other functions can use it. A few examples:
But I agree that the readonly "hack" can be removed from _PyBytesWriter API since the bytes type has no format method (no bytes%args nor bytes.format(args)). |
It may eventually get one, though. |
If a use case for the "read-only hack" comes, the hack can be added again later. It's better to start with something simple and extend it with new use cases. |
I'm not completly satisfied of bytes_writer-2.patch. Most encoders work directly on a pointer (char*). If I want to keep the code using pointers (because it is efficient), I have to resynchronize the writer and the pointer before and after calling writer methods. Here is a new patch implementing a different approach, closer to the current code. The patch version 3 has no "pos" attribute: the "position" is now a pointer (str). So encoders can just use this pointer instead of their own pointer. I expect that replacing "*p++ = c;" with "*writer.str++ = c;" would not add an important overhead, especially because writer is a local variable, and str is the first attribute of the structure. I hope that the machine code will be exactly the same. |
The last patch increases the size of the code substantially. I'm still wondering what the benefits are. $ hg di --stat
Include/bytesobject.h | 90 ++++++++++++++
Misc/NEWS | 3 +
Objects/bytesobject.c | 144 ++++++++++++++++++++++
Objects/stringlib/codecs.h | 109 ++++ Objects/unicodeobject.c | 267 ++++++++++++++++------------------------- |
"I expect that replacing "*p++ = c;" with "*writer.str++ = c;" would I ran some benchmarks: performances are better in some cases, but
For the "writer.str++" instruction, the new pointer value is written I compared with the original code:
GCC emits better code: str is stored in a register and the new value It may be an aliasing issue, but I didn't find how to say to GCC that |
Advantages of the patch.
I'm not convinced that the patch would permit to design faster code. According to the assembler, it is the opposite (when "*writer.str++" is used in a loop). I don't know if it's possible to design a more efficient _PyBytesWriter API (to help GCC to generate more efficient machine code), nor if the overhead is important in a "normal case" (bench_encoders.py tests border cases, text with many many errors). |
_PyBytesWriter API makes the code slower and does not really reduce the number of lines, so I'm closing this issue as invalid. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: