classification
Title: Add _PyBytesWriter API to optimize Unicode encoders
Type: performance Stage:
Components: Unicode Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, haypo, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-10-05 12:01 by haypo, last changed 2015-10-09 12:18 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
bench_utf8_result.txt haypo, 2015-10-05 12:02
bench_ucs1_result.txt haypo, 2015-10-05 12:04
bytes_writer.patch haypo, 2015-10-05 12:05 review
Messages (14)
msg252322 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-05 12:01
Attached patch is the first step to optimize Unicode encoders: it adds a _PyBytesWriter API. This API is responsible to use the most efficient buffer depending on the need:

* it's possible to use a small buffer directly allocated on the C stack
* otherwise a Python bytes object is allocated
* it's possible to overallocate the bytes objcet to reduce the number of calls to _PyBytes_Resize()

The patch only adds the new API, don't expect any speed up. I just added a small optimization: the overallocation is disabled in UCS1 encoder (ASCII and Latin1) for the last write.
msg252323 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-05 12:02
Result of  bench.py attached to issue #25267: attached bench_utf8_result.txt.

------------------------------------------------------+-------------+---------------
Summary                                               | utf8_before |     utf8_after
------------------------------------------------------+-------------+---------------
ignore: "\udcff" * length                             | 7.63 us (*) |        7.91 us
ignore: "a" * length + "\udcff"                       | 10.7 us (*) |        10.8 us
ignore: ("a" * 99 + "\udcff" * 99) * length           | 2.17 ms (*) |        2.16 ms
ignore: ("\udcff" * 99 + "a") * length                |  843 us (*) |         866 us
ignore: "\udcff" + "a" * length                       | 10.7 us (*) |          11 us
replace: "\udcff" * length                            | 7.87 us (*) |  8.43 us (+7%)
replace: "a" * length + "\udcff"                      | 10.8 us (*) |        10.9 us
replace: ("a" * 99 + "\udcff" * 99) * length          | 2.46 ms (*) |        2.46 ms
replace: ("\udcff" * 99 + "a") * length               |  907 us (*) |         939 us
replace: "\udcff" + "a" * length                      | 10.9 us (*) |          11 us
surrogateescape: "\udcff" * length                    | 14.2 us (*) | 17.2 us (+21%)
surrogateescape: "a" * length + "\udcff"              | 10.6 us (*) |        10.7 us
surrogateescape: ("a" * 99 + "\udcff" * 99) * length  | 3.19 ms (*) |   3.4 ms (+7%)
surrogateescape: ("\udcff" * 99 + "a") * length       | 1.64 ms (*) | 1.87 ms (+13%)
surrogateescape: "\udcff" + "a" * length              | 10.6 us (*) |        10.7 us
surrogatepass: "\udcff" * length                      | 23.1 us (*) |        23.5 us
surrogatepass: "a" * length + "\udcff"                | 10.7 us (*) |        10.8 us
surrogatepass: ("a" * 99 + "\udcff" * 99) * length    | 4.39 ms (*) |        4.44 ms
surrogatepass: ("\udcff" * 99 + "a") * length         | 2.43 ms (*) |        2.47 ms
surrogatepass: "\udcff" + "a" * length                | 10.6 us (*) |        10.8 us
backslashreplace: "\udcff" * length                   | 65.7 us (*) |        64.3 us
backslashreplace: "a" * length + "\udcff"             | 15.7 us (*) |          15 us
backslashreplace: ("a" * 99 + "\udcff" * 99) * length |   12 ms (*) | 15.9 ms (+32%)
backslashreplace: ("\udcff" * 99 + "a") * length      | 11.1 ms (*) | 13.5 ms (+22%)
backslashreplace: "\udcff" + "a" * length             | 16.4 us (*) |  15.1 us (-8%)
------------------------------------------------------+-------------+---------------
Total                                                 | 41.4 ms (*) | 48.3 ms (+17%)
------------------------------------------------------+-------------+---------------
msg252324 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-05 12:04
Results of bench.py attached to issue #25227 (ASCII and Latin1 encoders): attached bench_ucs1_result.txt file.

--------+-------------+-----------
Summary | ucs1_before | ucs1_after
--------+-------------+-----------
ascii   | 1.69 ms (*) |    1.69 ms
latin1  |  1.7 ms (*) |    1.69 ms
--------+-------------+-----------
Total   | 3.39 ms (*) |    3.39 ms
--------+-------------+-----------
msg252325 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-05 12:12
A few months ago, I wrote a previous implementation of the _PyBytesWriter API which embedded the "current pointer" inside _PyBytesWriter API. The problem was that GCC produced less efficient code than expect for the hotspot of the encoder.

In the new implementation (attached patch), the "current pointer" is unchanged: it's still a variable local to the encoder function. Instead, the current pointer became a *parameter* to all _PyBytesWriter *functions*.

I expect to not touch performances of encoders for valid encoded strings (when the code calling error handlers is not used), which is important since we have very good performance here.

_PyBytesWriter is not restricted to the code to allocate the buffer.

--

bytes_writer.patch:

+    char stackbuf[256];

Oh, I forgot to mention this other small optimization. I also added a small buffer allocated on the C stack for the UCS1 encoder (ASCII, Latin1). It may optimize a little bit encoding when the output string is smaller than 256 bytes when the error handler is used.

The optimization comes from the very efficient UTF-8 encoder.
msg252335 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-05 16:17
My previous abandonned attempt was the issue #17742.

"Add _PyBytesWriter API to optimize Unicode encoders"

Oh, I forgot to mention and it may also be used to optimize bytes % args. More generally, any code generating a bytes object with an unknown length is advance. Said differently: _PyBytesWriter can be used when precomputing the output length is more expensive.

str % args now uses _PyUnicodeWriter but building an Unicode string is even more complex because of the different Unicode "kinds": 1, 2 or 4 bytes per character.
msg252570 - (view) Author: Roundup Robot (python-dev) Date: 2015-10-08 22:59
New changeset 1a2175149c5e by Victor Stinner in branch 'default':
Issue #25318: Add _PyBytesWriter API
https://hg.python.org/cpython/rev/1a2175149c5e
msg252571 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-08 23:04
Oh, I was surprised to see same or worse performances for UTF-8/backslashreplace. In fact, I forgot to enable overallocation. With overallocation, it is now faster ;-)

I modified the API to put the "stack buffer" inside _PyBytesWriter API directly. I also reworked _PyBytesWriter_Alloc() to call  _PyBytesWriter_Prepare() so _PyBytesWriter_Alloc() now supports overallocation as well. It was part of _PyBytesWriter design to support overallocation at the first allocation (_PyBytesWriter_Alloc), that's why we have _PyBytesWriter_Alloc() *and* _PyBytesWriter_Init(): it's possible to set overallocate=1 between init and alloc.

I pushed my change since it didn't kill performances. It's only a little bit smaller but on very short encode: less than 500 ns. In other cases, it's the same performances or faster.
msg252573 - (view) Author: Roundup Robot (python-dev) Date: 2015-10-08 23:46
New changeset 59f4806a5add by Victor Stinner in branch 'default':
Optimize backslashreplace error handler
https://hg.python.org/cpython/rev/59f4806a5add
msg252574 - (view) Author: Roundup Robot (python-dev) Date: 2015-10-09 00:32
New changeset c134eddcb347 by Victor Stinner in branch 'default':
Issue #25318: Move _PyBytesWriter to bytesobject.c
https://hg.python.org/cpython/rev/c134eddcb347
msg252579 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-09 00:51
I created the issue #25349 "Use _PyBytesWriter for bytes%args".
msg252580 - (view) Author: Roundup Robot (python-dev) Date: 2015-10-09 00:52
New changeset e9c1404d6bd9 by Victor Stinner in branch 'default':
Issue #25318: Fix compilation error
https://hg.python.org/cpython/rev/e9c1404d6bd9
msg252582 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-09 01:27
The FreeBSD 9.x buildbot is grumpy.

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%203.x/builds/3495/steps/test/logs/stdio

Assertion failed: (start[writer->allocated] == 0), function _PyBytesWriter_CheckConsistency, file Objects/bytesobject.c, line 3809.
Fatal Python error: Aborted

Current thread 0x0000000801807400 (most recent call first):
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/test_pep277.py", line 150 in test_listdir
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/case.py", line 600 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/case.py", line 648 in __call__
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", line 122 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", line 122 in run
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", line 84 in __call__
  File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/runner.py", line 176 in run
...
msg252583 - (view) Author: Roundup Robot (python-dev) Date: 2015-10-09 01:39
New changeset 9cf89366bbcb by Victor Stinner in branch 'default':
Issue #25318: Avoid sprintf() in backslashreplace()
https://hg.python.org/cpython/rev/9cf89366bbcb

New changeset 0a522f68d275 by Victor Stinner in branch 'default':
Issue #25318: Fix backslashreplace()
https://hg.python.org/cpython/rev/0a522f68d275

New changeset c53dcf1d6967 by Victor Stinner in branch 'default':
Issue #25318: cleanup code _PyBytesWriter
https://hg.python.org/cpython/rev/c53dcf1d6967
msg252602 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-09 12:18
Buildbots still like this new API :-) (no test failure recently)

I reworked the API a little bit to make its usage simpler in Unicode encoders. I started to open new issues to using this new API in more functions producing byte strings.

I consider that this issue can now be closed. I'm happy, the API looks good to me and the modified code is faster.
History
Date User Action Args
2015-10-09 12:18:57hayposetstatus: open -> closed
resolution: fixed
messages: + msg252602
2015-10-09 01:39:00python-devsetmessages: + msg252583
2015-10-09 01:27:37hayposetmessages: + msg252582
2015-10-09 00:52:49python-devsetmessages: + msg252580
2015-10-09 00:51:25hayposetmessages: + msg252579
2015-10-09 00:32:57python-devsetmessages: + msg252574
2015-10-08 23:46:53python-devsetmessages: + msg252573
2015-10-08 23:04:15hayposetmessages: + msg252571
2015-10-08 22:59:55python-devsetnosy: + python-dev
messages: + msg252570
2015-10-05 16:17:29hayposetmessages: + msg252335
2015-10-05 12:12:22hayposetmessages: + msg252325
2015-10-05 12:05:32hayposetfiles: + bytes_writer.patch
keywords: + patch
2015-10-05 12:04:41hayposetfiles: + bench_ucs1_result.txt
2015-10-05 12:04:04hayposetmessages: + msg252324
2015-10-05 12:02:50hayposetfiles: + bench_utf8_result.txt

messages: + msg252323
2015-10-05 12:01:28haypocreate