classification
Title: Optimize bytearray % args
Type: performance Stage:
Components: Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: haypo, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-10-14 00:42 by haypo, last changed 2015-10-14 09:05 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
bytearray_format.patch haypo, 2015-10-14 00:42 review
bench_bytes_format.py haypo, 2015-10-14 08:01
Messages (3)
msg252970 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-14 00:42
Optimize bytearray % args

Don't create temporary bytes objects: modify _PyBytes_Format() to create work
directly on bytearray objects.

* _PyBytesWriter: add use_bytearray attribute to use a bytearray buffer
* Rename _PyBytes_Format() to _PyBytes_FormatEx() just in case if something
  outside CPython uses it
* _PyBytes_FormatEx() now uses (char*, Py_ssize_t) for the input string, so
  bytearray_format() doesn't need tot create a temporary input bytes object
* Add use_bytearray parameter to _PyBytes_FormatEx() which is passed to
  _PyBytesWriter, to create a bytearray buffer instead of a bytes buffer
msg252977 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-14 08:01
Microbenchmark result below.

Most operations are now between 2.5 and 5 times faster. %f is as-fast, probably because formatting a float is more expensive than copying bytes (raw estimation: 150 ns to format a single floating pointer number).

Common platform:
Timer: time.perf_counter
CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Bits: int=32, long=64, long long=64, size_t=64, void*=64
CFLAGS: -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
Platform: Linux-4.1.6-200.fc22.x86_64-x86_64-with-fedora-22-Twenty_Two
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
Python unicode implementation: PEP 393

Platform of campaign orig:
Python version: 3.6.0a0 (default:af34d0626fb4, Oct 14 2015, 09:51:04) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
Timer precision: 64 ns
Date: 2015-10-14 09:51:20
SCM: hg revision=af34d0626fb4 branch=default date="2015-10-14 09:47 +0200"

Platform of campaign no_copy:
Timer precision: 59 ns
Python version: 3.6.0a0 (default:2e9d9873d2be, Oct 14 2015, 09:49:28) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
Date: 2015-10-14 09:49:44
SCM: hg revision=2e9d9873d2be tag=tip branch=default date="2015-10-14 09:41 +0200"

-------------------------------------------------+-------------+--------------
use smaller buffer                               |        orig |       no_copy
-------------------------------------------------+-------------+--------------
fmt = bytearray(b"hello %s"); fmt % b"world"     |  656 ns (*) |  93 ns (-86%)
fmt = bytearray(b"hello %-100s"); fmt % b"world" |  686 ns (*) | 105 ns (-85%)
fmt = bytearray(b"x=%d"); fmt % 123              |  689 ns (*) | 112 ns (-84%)
fmt = bytearray(b"x=%f"); fmt % 1.2              |  976 ns (*) | 216 ns (-78%)
fmt = bytearray(b"x=%100d"); fmt % 123           |  870 ns (*) | 172 ns (-80%)
-------------------------------------------------+-------------+--------------
Total                                            | 3.88 us (*) | 698 ns (-82%)
-------------------------------------------------+-------------+--------------

------------------------------------------------------------+-------------+---------------
"hello %s" % long_string                                    |        orig |        no_copy
------------------------------------------------------------+-------------+---------------
fmt = bytearray(b"hello %s"); arg = b"x" * 10; fmt % arg    |  661 ns (*) |   93 ns (-86%)
fmt = bytearray(b"hello %s"); arg = b"x" * 100; fmt % arg   |  667 ns (*) |   93 ns (-86%)
fmt = bytearray(b"hello %s"); arg = b"x" * 10**3; fmt % arg |  982 ns (*) |  186 ns (-81%)
fmt = bytearray(b"hello %s"); arg = b"x" * 10**5; fmt % arg | 10.2 us (*) | 4.42 us (-57%)
------------------------------------------------------------+-------------+---------------
Total                                                       | 12.5 us (*) |  4.8 us (-62%)
------------------------------------------------------------+-------------+---------------

--------------------------------------------------+-------------+---------------
b"xxxxxx %s" % b"y"                               |        orig |        no_copy
--------------------------------------------------+-------------+---------------
fmt = bytearray(b"x" * 10 + b"%s"); fmt % b"y"    |  653 ns (*) |   88 ns (-86%)
fmt = bytearray(b"x" * 100 + b"%s"); fmt % b"y"   |  674 ns (*) |   94 ns (-86%)
fmt = bytearray(b"x" * 10**3 + b"%s"); fmt % b"y" | 1.09 us (*) |  213 ns (-80%)
fmt = bytearray(b"x" * 10**5 + b"%s"); fmt % b"y" | 21.4 us (*) | 8.47 us (-60%)
--------------------------------------------------+-------------+---------------
Total                                             | 23.8 us (*) | 8.87 us (-63%)
--------------------------------------------------+-------------+---------------

---------------------------------------------------------------------+-------------+--------
%f                                                                   |        orig | no_copy
---------------------------------------------------------------------+-------------+--------
n = 200; fmt = bytearray(b"%f" * n); arg = tuple([1.2]*n); fmt % arg | 32.2 us (*) | 32.3 us
---------------------------------------------------------------------+-------------+--------

-----------------------------------------------------------------------+-------------+---------------
%i                                                                     |        orig |        no_copy
-----------------------------------------------------------------------+-------------+---------------
n = 1; fmt = bytearray(b"%d" * n); arg = tuple([12345]*n); fmt % arg   |  678 ns (*) |  105 ns (-85%)
n = 5; fmt = bytearray(b"%d" * n); arg = tuple([12345]*n); fmt % arg   |  884 ns (*) |  296 ns (-66%)
n = 10; fmt = bytearray(b"%d" * n); arg = tuple([12345]*n); fmt % arg  | 1.13 us (*) |  531 ns (-53%)
n = 25; fmt = bytearray(b"%d" * n); arg = tuple([12345]*n); fmt % arg  | 1.85 us (*) | 1.24 us (-33%)
n = 100; fmt = bytearray(b"%d" * n); arg = tuple([12345]*n); fmt % arg | 5.62 us (*) |  4.8 us (-15%)
n = 200; fmt = bytearray(b"%d" * n); arg = tuple([12345]*n); fmt % arg | 10.6 us (*) |        10.8 us
n = 500; fmt = bytearray(b"%d" * n); arg = tuple([12345]*n); fmt % arg | 25.1 us (*) |  26.8 us (+7%)
-----------------------------------------------------------------------+-------------+---------------
Total                                                                  | 45.9 us (*) |        44.6 us
-----------------------------------------------------------------------+-------------+---------------

--------------------------------------------------------------------------+-------------+---------------
x=%i                                                                      |        orig |        no_copy
--------------------------------------------------------------------------+-------------+---------------
n = 1; fmt = bytearray(b"x=%d " * n); arg = tuple([12345]*n); fmt % arg   |  699 ns (*) |  123 ns (-82%)
n = 5; fmt = bytearray(b"x=%d " * n); arg = tuple([12345]*n); fmt % arg   |  943 ns (*) |  364 ns (-61%)
n = 10; fmt = bytearray(b"x=%d " * n); arg = tuple([12345]*n); fmt % arg  | 1.22 us (*) |  655 ns (-47%)
n = 25; fmt = bytearray(b"x=%d " * n); arg = tuple([12345]*n); fmt % arg  | 2.08 us (*) | 1.52 us (-27%)
n = 100; fmt = bytearray(b"x=%d " * n); arg = tuple([12345]*n); fmt % arg | 6.86 us (*) |        6.79 us
n = 200; fmt = bytearray(b"x=%d " * n); arg = tuple([12345]*n); fmt % arg | 12.6 us (*) |  13.3 us (+6%)
n = 500; fmt = bytearray(b"x=%d " * n); arg = tuple([12345]*n); fmt % arg | 29.7 us (*) |  32.4 us (+9%)
--------------------------------------------------------------------------+-------------+---------------
Total                                                                     | 54.1 us (*) |        55.2 us
--------------------------------------------------------------------------+-------------+---------------

-----------------------------------------------------------------------+-------------+---------------
%x                                                                     |        orig |        no_copy
-----------------------------------------------------------------------+-------------+---------------
n = 1; fmt = bytearray(b"%d" * n); arg = tuple([12345]*n); fmt % arg   |  677 ns (*) |  105 ns (-85%)
n = 5; fmt = bytearray(b"%d" * n); arg = tuple([12345]*n); fmt % arg   |  886 ns (*) |  297 ns (-67%)
n = 10; fmt = bytearray(b"%d" * n); arg = tuple([12345]*n); fmt % arg  | 1.13 us (*) |  530 ns (-53%)
n = 25; fmt = bytearray(b"%d" * n); arg = tuple([12345]*n); fmt % arg  | 1.85 us (*) | 1.24 us (-33%)
n = 100; fmt = bytearray(b"%d" * n); arg = tuple([12345]*n); fmt % arg | 5.64 us (*) | 4.82 us (-15%)
n = 200; fmt = bytearray(b"%d" * n); arg = tuple([12345]*n); fmt % arg | 10.7 us (*) |        10.8 us
n = 500; fmt = bytearray(b"%d" * n); arg = tuple([12345]*n); fmt % arg | 25.2 us (*) |  26.8 us (+7%)
-----------------------------------------------------------------------+-------------+---------------
Total                                                                  | 46.1 us (*) |        44.6 us
-----------------------------------------------------------------------+-------------+---------------

-----------------------------------------------------------------------------+-------------+---------------
x=%x                                                                         |        orig |        no_copy
-----------------------------------------------------------------------------+-------------+---------------
n = 1; fmt = bytearray(b"x=%x " * n); arg = tuple([0xabcdef]*n); fmt % arg   |  685 ns (*) |  120 ns (-82%)
n = 5; fmt = bytearray(b"x=%x " * n); arg = tuple([0xabcdef]*n); fmt % arg   |  916 ns (*) |  342 ns (-63%)
n = 10; fmt = bytearray(b"x=%x " * n); arg = tuple([0xabcdef]*n); fmt % arg  | 1.19 us (*) |  609 ns (-49%)
n = 25; fmt = bytearray(b"x=%x " * n); arg = tuple([0xabcdef]*n); fmt % arg  | 1.99 us (*) | 1.41 us (-29%)
n = 100; fmt = bytearray(b"x=%x " * n); arg = tuple([0xabcdef]*n); fmt % arg | 6.64 us (*) |        6.43 us
n = 200; fmt = bytearray(b"x=%x " * n); arg = tuple([0xabcdef]*n); fmt % arg | 11.9 us (*) |  12.7 us (+7%)
n = 500; fmt = bytearray(b"x=%x " * n); arg = tuple([0xabcdef]*n); fmt % arg | 28.2 us (*) |  30.5 us (+8%)
-----------------------------------------------------------------------------+-------------+---------------
Total                                                                        | 51.5 us (*) |        52.1 us
-----------------------------------------------------------------------------+-------------+---------------

-------------------------------------------------------+-------------+---------------
large int: %i                                          |        orig |        no_copy
-------------------------------------------------------+-------------+---------------
fmt = bytearray(b"%i"); arg = 10 ** 0 - 1; fmt % arg   |  651 ns (*) |   75 ns (-89%)
fmt = bytearray(b"%i"); arg = 10 ** 50 - 1; fmt % arg  |  810 ns (*) |  245 ns (-70%)
fmt = bytearray(b"%i"); arg = 10 ** 100 - 1; fmt % arg | 1.06 us (*) |  496 ns (-53%)
fmt = bytearray(b"%i"); arg = 10 ** 150 - 1; fmt % arg | 1.38 us (*) |  819 ns (-41%)
fmt = bytearray(b"%i"); arg = 10 ** 200 - 1; fmt % arg | 1.87 us (*) | 1.28 us (-32%)
-------------------------------------------------------+-------------+---------------
Total                                                  | 5.78 us (*) | 2.91 us (-50%)
-------------------------------------------------------+-------------+---------------

---------------------------------------------------------+-------------+---------------
large int: x=%i                                          |        orig |        no_copy
---------------------------------------------------------+-------------+---------------
fmt = bytearray(b"x=%i"); arg = 10 ** 0 - 1; fmt % arg   |  674 ns (*) |  103 ns (-85%)
fmt = bytearray(b"x=%i"); arg = 10 ** 50 - 1; fmt % arg  |  820 ns (*) |  254 ns (-69%)
fmt = bytearray(b"x=%i"); arg = 10 ** 100 - 1; fmt % arg | 1.07 us (*) |  503 ns (-53%)
fmt = bytearray(b"x=%i"); arg = 10 ** 150 - 1; fmt % arg |  1.4 us (*) |  824 ns (-41%)
---------------------------------------------------------+-------------+---------------
Total                                                    | 3.96 us (*) | 1.68 us (-58%)
---------------------------------------------------------+-------------+---------------

-------------------------+-------------+---------------
Summary                  |        orig |        no_copy
-------------------------+-------------+---------------
use smaller buffer       | 3.88 us (*) |  698 ns (-82%)
"hello %s" % long_string | 12.5 us (*) |  4.8 us (-62%)
b"xxxxxx %s" % b"y"      | 23.8 us (*) | 8.87 us (-63%)
%f                       | 32.2 us (*) |        32.3 us
%i                       | 45.9 us (*) |        44.6 us
x=%i                     | 54.1 us (*) |        55.2 us
%x                       | 46.1 us (*) |        44.6 us
x=%x                     | 51.5 us (*) |        52.1 us
large int: %i            | 5.78 us (*) | 2.91 us (-50%)
large int: x=%i          | 3.96 us (*) | 1.68 us (-58%)
-------------------------+-------------+---------------
Total                    |  280 us (*) |  248 us (-11%)
-------------------------+-------------+---------------
msg252978 - (view) Author: Roundup Robot (python-dev) Date: 2015-10-14 08:02
New changeset 03646293f1b3 by Victor Stinner in branch 'default':
Fix long_format_binary()
https://hg.python.org/cpython/rev/03646293f1b3

New changeset 6fe0050a2f52 by Victor Stinner in branch 'default':
Add use_bytearray attribute to _PyBytesWriter
https://hg.python.org/cpython/rev/6fe0050a2f52

New changeset f369b79c0153 by Victor Stinner in branch 'default':
Optimize bytearray % args
https://hg.python.org/cpython/rev/f369b79c0153
History
Date User Action Args
2015-10-14 09:05:21hayposetstatus: open -> closed
resolution: fixed
2015-10-14 08:02:11python-devsetnosy: + python-dev
messages: + msg252978
2015-10-14 08:01:45hayposetfiles: + bench_bytes_format.py

messages: + msg252977
2015-10-14 00:42:28haypocreate