classification
Title: Use _PyBytesWriter for bytes%args
Type: performance Stage:
Components: Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ethan.furman, haypo, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-10-09 00:50 by haypo, last changed 2016-04-26 10:36 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
bytes_format.patch haypo, 2015-10-09 00:50 review
bench_bytes_format.py haypo, 2015-10-09 10:20
bytes_formatlong.patch haypo, 2015-10-09 16:57 review
bench_bytes_int.py haypo, 2015-10-09 20:40
Messages (9)
msg252577 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-09 00:50
Attached patch is a work-in-progress patch to use the new private _PyBytesWriter API in bytes % args.

The usage of the _PyBytesWriter API will allow further optimization. For example, it avoids the creation of a temporary bytes object to format b'%f' % 1.2.

The _PyBytesWriter API allocates a small buffer of 512 bytes on the stack to delay the allocation of the final bytes objects. It can avoid the need to call _PyBytes_Resize() completly, or at least reduce the number of calls.

See also the issue #25318 which added the _PyBytesWriter API.
msg252578 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-09 00:51
See also the PEP 461 "Adding % formatting to bytes and bytearray".

FYI bytes % args is tested by test_format (good to know to test quickly changes).
msg252596 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-09 10:20
bench_bytes_format.py: micro-benchmark testing a few formats. Some tests are focused on the implementation of _PyBytesWriter to ensure that the optimization is efficient.

Except of a single test (which is not really revelant, it takes less than 500 nanoseconds), all tests are faster.

The b"xxxxxx %s" % b"y" test confirms that the optimization disabling overallocation for the last write is effective.

Results:

Common platform:
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
Python unicode implementation: PEP 393
CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Platform: Linux-4.1.6-200.fc22.x86_64-x86_64-with-fedora-22-Twenty_Two
CFLAGS: -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
Timer: time.perf_counter
Bits: int=32, long=64, long long=64, size_t=64, void*=64

Platform of campaign orig:
SCM: hg revision=1aae9b6a6929 tag=tip branch=default date="2015-10-09 01:34 -0400"
Timer precision: 64 ns
Python version: 3.6.0a0 (default:1aae9b6a6929, Oct 9 2015, 11:33:56) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
Date: 2015-10-09 11:34:11

Platform of campaign writer:
SCM: hg revision=fc2c11a19ae1+ tag=tip branch=default date="2015-10-09 11:48 +0200"
Timer precision: 61 ns
Python version: 3.6.0a0 (default:fc2c11a19ae1+, Oct 9 2015, 12:16:16) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
Date: 2015-10-09 12:16:31

---------------------------+------------+--------------
use smaller buffer         |       orig |        writer
---------------------------+------------+--------------
b"hello %s" % b"world"     |  13 ns (*) |   12 ns (-5%)
b"hello %-100s" % b"world" | 158 ns (*) |  98 ns (-38%)
b"x=%d" % 123              |  13 ns (*) |         12 ns
b"x=%f" % 1.2              |  13 ns (*) |         13 ns
b"x=%100d" % 123           | 156 ns (*) |  166 ns (+7%)
---------------------------+------------+--------------
Total                      | 353 ns (*) | 301 ns (-15%)
---------------------------+------------+--------------

-------------------------------------------------+-------------+---------------
"hello %s" % long_string                         |        orig |         writer
-------------------------------------------------+-------------+---------------
fmt = b"hello %s"; arg = b"x" * 10; fmt % arg    |   98 ns (*) |   86 ns (-12%)
fmt = b"hello %s"; arg = b"x" * 100; fmt % arg   |   85 ns (*) |          87 ns
fmt = b"hello %s"; arg = b"x" * 10**3; fmt % arg |  298 ns (*) |  208 ns (-30%)
fmt = b"hello %s"; arg = b"x" * 10**5; fmt % arg |  4.8 us (*) |  4.39 us (-9%)
-------------------------------------------------+-------------+---------------
Total                                            | 5.28 us (*) | 4.77 us (-10%)
-------------------------------------------------+-------------+---------------

---------------------------------------+-------------+---------------
b"xxxxxx %s" % b"y"                    |        orig |         writer
---------------------------------------+-------------+---------------
fmt = b"x" * 10 + b"%s"; fmt % b"y"    |   99 ns (*) |   81 ns (-18%)
fmt = b"x" * 100 + b"%s"; fmt % b"y"   |  189 ns (*) |   87 ns (-54%)
fmt = b"x" * 10**3 + b"%s"; fmt % b"y" | 1.12 us (*) |  209 ns (-81%)
fmt = b"x" * 10**5 + b"%s"; fmt % b"y" | 88.4 us (*) | 8.49 us (-90%)
---------------------------------------+-------------+---------------
Total                                  | 89.8 us (*) | 8.87 us (-90%)
---------------------------------------+-------------+---------------

----------------------------------------------------------+-------------+---------------
%f                                                        |        orig |         writer
----------------------------------------------------------+-------------+---------------
n = 200; fmt = b"%f" * n; arg = tuple([1.2]*n); fmt % arg | 37.2 us (*) | 29.6 us (-21%)
----------------------------------------------------------+-------------+---------------

------------------------------------------------------------+-------------+---------------
%i                                                          |        orig |         writer
------------------------------------------------------------+-------------+---------------
n = 200; fmt = b"%f" * n; arg = tuple([12345]*n); fmt % arg | 49.4 us (*) | 42.8 us (-13%)
------------------------------------------------------------+-------------+---------------

-------------------------+-------------+---------------
Summary                  |        orig |         writer
-------------------------+-------------+---------------
use smaller buffer       |  353 ns (*) |  301 ns (-15%)
"hello %s" % long_string | 5.28 us (*) | 4.77 us (-10%)
b"xxxxxx %s" % b"y"      | 89.8 us (*) | 8.87 us (-90%)
%f                       | 37.2 us (*) | 29.6 us (-21%)
%i                       | 49.4 us (*) | 42.8 us (-13%)
-------------------------+-------------+---------------
Total                    |  182 us (*) | 86.3 us (-53%)
-------------------------+-------------+---------------
msg252597 - (view) Author: Roundup Robot (python-dev) Date: 2015-10-09 10:21
New changeset b2f3cbdc0f2d by Victor Stinner in branch 'default':
Issue #25349: Optimize bytes % args using the new private _PyBytesWriter API
https://hg.python.org/cpython/rev/b2f3cbdc0f2d
msg252629 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-09 16:57
bytes_formatlong.patch: Fast-path for b'%d' % int and other integer formatters. It avoids the creation of a temporary bytes object, it writes directly into the writer, as '%d' % int (Unicode).
msg252650 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-09 20:40
I wrote bench_bytes_int.py micro-benchmark, results are below.

Oh, I did'n expected a real difference even for simple code like b'%d' % 12345 (32% faster). So I consider that it's enough to apply the optimization.

Common platform:
Timer: time.perf_counter
Bits: int=32, long=64, long long=64, size_t=64, void*=64
CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Platform: Linux-4.1.6-200.fc22.x86_64-x86_64-with-fedora-22-Twenty_Two
Python unicode implementation: PEP 393
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
CFLAGS: -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes

Platform of campaign orig:
SCM: hg revision=576128c0d068 tag=tip branch=default date="2015-10-09 10:20 -0400"
Python version: 3.6.0a0 (default:576128c0d068, Oct 9 2015, 22:36:21) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
Date: 2015-10-09 22:36:36
Timer precision: 62 ns

Platform of campaign writer:
Python version: 3.6.0a0 (default:576128c0d068+, Oct 9 2015, 22:28:09) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
Date: 2015-10-09 22:34:53
SCM: hg revision=576128c0d068+ tag=tip branch=default date="2015-10-09 10:20 -0400"
Timer precision: 65 ns

------------------------------------------------------------+-------------+---------------
%i                                                          |        orig |         writer
------------------------------------------------------------+-------------+---------------
n = 1; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg   |  155 ns (*) |  105 ns (-32%)
n = 5; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg   |  546 ns (*) |  306 ns (-44%)
n = 10; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg  | 1.03 us (*) |  543 ns (-47%)
n = 25; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg  | 2.49 us (*) | 1.27 us (-49%)
n = 100; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg | 10.1 us (*) | 5.25 us (-48%)
n = 200; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg | 20.5 us (*) | 10.8 us (-47%)
n = 500; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg | 48.8 us (*) | 24.6 us (-50%)
------------------------------------------------------------+-------------+---------------
Total                                                       | 83.6 us (*) | 42.9 us (-49%)
------------------------------------------------------------+-------------+---------------

---------------------------------------------------------------+-------------+---------------
x=%i                                                           |        orig |         writer
---------------------------------------------------------------+-------------+---------------
n = 1; fmt = b"x=%d " * n; arg = tuple([12345]*n); fmt % arg   |  173 ns (*) |  123 ns (-29%)
n = 5; fmt = b"x=%d " * n; arg = tuple([12345]*n); fmt % arg   |  602 ns (*) |  372 ns (-38%)
n = 10; fmt = b"x=%d " * n; arg = tuple([12345]*n); fmt % arg  | 1.14 us (*) |  668 ns (-42%)
n = 25; fmt = b"x=%d " * n; arg = tuple([12345]*n); fmt % arg  |  2.8 us (*) | 1.56 us (-44%)
n = 100; fmt = b"x=%d " * n; arg = tuple([12345]*n); fmt % arg | 11.1 us (*) | 6.12 us (-45%)
n = 200; fmt = b"x=%d " * n; arg = tuple([12345]*n); fmt % arg | 21.5 us (*) | 12.1 us (-44%)
n = 500; fmt = b"x=%d " * n; arg = tuple([12345]*n); fmt % arg | 53.5 us (*) | 29.8 us (-44%)
---------------------------------------------------------------+-------------+---------------
Total                                                          | 90.8 us (*) | 50.7 us (-44%)
---------------------------------------------------------------+-------------+---------------

------------------------------------------------------------+-------------+---------------
%x                                                          |        orig |         writer
------------------------------------------------------------+-------------+---------------
n = 1; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg   |  155 ns (*) |  105 ns (-32%)
n = 5; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg   |  545 ns (*) |  306 ns (-44%)
n = 10; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg  | 1.03 us (*) |  543 ns (-47%)
n = 25; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg  | 2.49 us (*) | 1.26 us (-49%)
n = 100; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg |  9.9 us (*) | 5.07 us (-49%)
n = 200; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg | 19.8 us (*) | 10.1 us (-49%)
n = 500; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg | 48.9 us (*) | 24.5 us (-50%)
------------------------------------------------------------+-------------+---------------
Total                                                       | 82.8 us (*) | 41.9 us (-49%)
------------------------------------------------------------+-------------+---------------

------------------------------------------------------------------+-------------+---------------
x=%x                                                              |        orig |         writer
------------------------------------------------------------------+-------------+---------------
n = 1; fmt = b"x=%d " * n; arg = tuple([0xabcdef]*n); fmt % arg   |  183 ns (*) |  132 ns (-28%)
n = 5; fmt = b"x=%d " * n; arg = tuple([0xabcdef]*n); fmt % arg   |  651 ns (*) |  419 ns (-36%)
n = 10; fmt = b"x=%d " * n; arg = tuple([0xabcdef]*n); fmt % arg  | 1.23 us (*) |  761 ns (-38%)
n = 25; fmt = b"x=%d " * n; arg = tuple([0xabcdef]*n); fmt % arg  | 2.96 us (*) | 1.79 us (-40%)
n = 100; fmt = b"x=%d " * n; arg = tuple([0xabcdef]*n); fmt % arg | 11.9 us (*) | 7.13 us (-40%)
n = 200; fmt = b"x=%d " * n; arg = tuple([0xabcdef]*n); fmt % arg | 23.5 us (*) |   14 us (-41%)
n = 500; fmt = b"x=%d " * n; arg = tuple([0xabcdef]*n); fmt % arg | 58.3 us (*) | 34.3 us (-41%)
------------------------------------------------------------------+-------------+---------------
Total                                                             | 98.6 us (*) | 58.5 us (-41%)
------------------------------------------------------------------+-------------+---------------

--------------------------------------------+-------------+--------------
large int: %i                               |        orig |        writer
--------------------------------------------+-------------+--------------
fmt = b"%i"; arg = 10 ** 0 - 1; fmt % arg   |  115 ns (*) |  74 ns (-36%)
fmt = b"%i"; arg = 10 ** 50 - 1; fmt % arg  |  288 ns (*) | 242 ns (-16%)
fmt = b"%i"; arg = 10 ** 100 - 1; fmt % arg |  538 ns (*) |  494 ns (-8%)
fmt = b"%i"; arg = 10 ** 150 - 1; fmt % arg |  865 ns (*) |  812 ns (-6%)
fmt = b"%i"; arg = 10 ** 200 - 1; fmt % arg | 1.33 us (*) |       1.28 us
--------------------------------------------+-------------+--------------
Total                                       | 3.14 us (*) |  2.9 us (-8%)
--------------------------------------------+-------------+--------------

----------------------------------------------+-------------+---------------
large int: x=%i                               |        orig |         writer
----------------------------------------------+-------------+---------------
fmt = b"x=%i"; arg = 10 ** 0 - 1; fmt % arg   |  140 ns (*) |  100 ns (-28%)
fmt = b"x=%i"; arg = 10 ** 50 - 1; fmt % arg  |  298 ns (*) |  249 ns (-16%)
fmt = b"x=%i"; arg = 10 ** 100 - 1; fmt % arg |  548 ns (*) |   502 ns (-8%)
fmt = b"x=%i"; arg = 10 ** 150 - 1; fmt % arg |  874 ns (*) |   822 ns (-6%)
----------------------------------------------+-------------+---------------
Total                                         | 1.86 us (*) | 1.67 us (-10%)
----------------------------------------------+-------------+---------------

-------------------+-------------+---------------
Summary            |        orig |         writer
-------------------+-------------+---------------
%i                 | 83.6 us (*) | 42.9 us (-49%)
x=%i               | 90.8 us (*) | 50.7 us (-44%)
%x                 | 82.8 us (*) | 41.9 us (-49%)
x=%x               | 98.6 us (*) | 58.5 us (-41%)
large int: %i      | 3.14 us (*) |   2.9 us (-8%)
large int: x=%i    | 1.86 us (*) | 1.67 us (-10%)
-------------------+-------------+---------------
Total              |  363 us (*) |  201 us (-45%)
-------------------+-------------+---------------
msg252655 - (view) Author: Roundup Robot (python-dev) Date: 2015-10-09 21:04
New changeset d9a89c9137d2 by Victor Stinner in branch 'default':
Issue #25349: Optimize bytes % int
https://hg.python.org/cpython/rev/d9a89c9137d2

New changeset 4d46d1588629 by Victor Stinner in branch 'default':
Issue #25349: Add fast path for b'%c' % int
https://hg.python.org/cpython/rev/4d46d1588629
msg252657 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-09 21:06
Ok, I implemented all optimizations which were already implemented in str % args. I close the issue.
msg264246 - (view) Author: Roundup Robot (python-dev) Date: 2016-04-26 10:36
New changeset 090502a0c69c by Victor Stinner in branch 'default':
Issue #25349, #26249: Fix memleak in formatfloat()
https://hg.python.org/cpython/rev/090502a0c69c
History
Date User Action Args
2016-04-26 10:36:43python-devsetmessages: + msg264246
2015-10-09 21:06:38hayposetstatus: open -> closed
resolution: fixed
messages: + msg252657
2015-10-09 21:04:48python-devsetmessages: + msg252655
2015-10-09 20:40:32hayposetfiles: + bench_bytes_int.py

messages: + msg252650
2015-10-09 16:57:51hayposetfiles: + bytes_formatlong.patch

messages: + msg252629
2015-10-09 10:21:59python-devsetnosy: + python-dev
messages: + msg252597
2015-10-09 10:20:43hayposetfiles: + bench_bytes_format.py

messages: + msg252596
2015-10-09 00:51:01hayposetmessages: + msg252578
2015-10-09 00:50:25haypocreate