New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance _PyUnicodeWriter API to control minimum buffer length without overallocation #61894
Comments
The _PyUnicodeWriter API is used in many functions to create Unicode strings, especially decoders. Performances are not optimal: it is not possible to specify the minimum length of the buffer if the overallocation is disabled. It may help bpo-17693 for example. |
We have this issue triaged for at CPython hackathon in Boston. Here is a patch for the issue. We only tested on Mac OS X 10.8.3, which has zoned allocator, so the memory profile is exactly the same with our without this patch. The running time seems to be slightly better with the patch. The benchmark we used runs for about 5.6 seconds with the patch vs. 5.9 seconds without the patch. We run the benchmark multiple times and the results seem to be consistent. Here are the results of the benchmarking and memory profile testing:
Mem 1535 nodes (6377296 bytes) 1535 nodes (6378144 bytes) The memory profile was measured by the MacOS X 'heap' command. The timings come from attached benchmark module. The original benchmark module is taken from here http://bugs.python.org/file25558/benchmark.py and was modified to test this issue. |
For some reason can't figure out how to attach multiple files. So here is the benchmark module |
I'd like to note that the actual patch was written by Adam.Duston http://bugs.python.org/user17706 I just verified the results, measured the time/memory performance submitted the patch. |
Attached patch changes _PyUnicodeWriter_Init() API: it now only has one argument (the writer). Minimum length and overallocation must be configured using attributes. The problem with the old API was that it was not possible to configure minimum length and overallocation separatly. Disable overallocation in CJK decoders: only set the minimum length. Other changes:
The goal is to delay the first allocation until the first real write to be able to choose correctly the maximum character and the buffer size. If the buffer is allocated before the first write, even the first write must widen and/or enlarge the buffer. |
I don't see how bpo-17694.patch can speedup Python because min_length is zero when overallocation is disabled. It may be noise of the benchmark script. |
PyUnicode_DecodeUnicodeEscape() should set writer.min_length instead of using _PyUnicodeWriter_Prepare(), but the following assertion fails (because writer.size is zero by default):
I don't understand this assertion, so I don't know how to modify it. |
PyUnicode_DecodeCharmap() still uses _PyUnicodeWriter_Prepare() (even with my patch). It may be interesting to benchmark min_length vs prepare. If min_length is not slower, it should be used instead of prepare to avoid widen the buffer if the first written character is non-ASCII, b'\x80'.decode('cp1252') for example. |
New changeset edf029fc9591 by Victor Stinner in branch 'default': |
The commit changes the default value of min_length when overallocation is enabled: it does not use at least 100 characters anymore. It did not directly introduce a bug, but the regression comes from 7ed9993d53b4 (use _PyUnicodeWriter for Unicode decoders). The following commits should fix these issues. changeset: 83435:94d1c3bdb79c Bug introduced by changesets 7ed9993d53b4 and edf029fc9591. changeset: 83434:7eb52460c999 Bug introduced by changeset 7ed9993d53b4. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: