classification
Title: Optimize bytes.fromhex() and bytearray.fromhex()
Type: performance Stage:
Components: Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: haypo, python-dev
Priority: normal Keywords: patch

Created on 2015-10-14 09:09 by haypo, last changed 2015-10-14 10:05 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
fromhex.patch haypo, 2015-10-14 09:09 review
bench_fromhex.py haypo, 2015-10-14 09:32
Messages (4)
msg252979 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-14 09:09
Attached patch optimizes bytes.fromhex() and bytearray.fromhex():

* Fast-path working on a char* string for ASCII string
* Slow-path for non-ASCII string
* Replace slow hex_digit_to_int() function with a O(1) lookup in _PyLong_DigitValue precomputed table
* Use _PyBytesWriter API to handle the buffer
* Check the error position in error messages
msg252980 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-10-14 09:32
It's between 2 and 3.5x faster.

It's 9% slower on short string (10 bytes for the output), but I consider that the speedup is more interesting than the slowdown on short strings.

RMicrobenchmark:

Common platform:
Platform: Linux-4.1.6-200.fc22.x86_64-x86_64-with-fedora-22-Twenty_Two
Timer: time.perf_counter
CFLAGS: -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
Python unicode implementation: PEP 393
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Bits: int=32, long=64, long long=64, size_t=64, void*=64

Platform of campaign orig:
SCM: hg revision=90e41d965228 tag=tip branch=default date="2015-10-14 10:10 +0200"
Python version: 3.6.0a0 (default:90e41d965228, Oct 14 2015, 10:46:50) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
Date: 2015-10-14 10:47:05
Timer precision: 54 ns

Platform of campaign optim:
SCM: hg revision=90e41d965228+ tag=tip branch=default date="2015-10-14 10:10 +0200"
Python version: 3.6.0a0 (default:90e41d965228+, Oct 14 2015, 11:07:24) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
Date: 2015-10-14 11:09:53
Timer precision: 62 ns

-----------------------------------------+-------------+---------------
without spaces                           |        orig |          optim
-----------------------------------------+-------------+---------------
data = "AB" * 10; bytes.fromhex(data)    |  167 ns (*) |   181 ns (+9%)
data = "AB" * 100; bytes.fromhex(data)   |  621 ns (*) |  295 ns (-52%)
data = "AB" * 10**3; bytes.fromhex(data) | 5.15 us (*) | 1.65 us (-68%)
data = "AB" * 10**5; bytes.fromhex(data) |  500 us (*) |  147 us (-71%)
-----------------------------------------+-------------+---------------
Total                                    |  506 us (*) |  149 us (-70%)
-----------------------------------------+-------------+---------------

---------------------------------------------------+-------------+---------------
with 0.5 space                                     |        orig |          optim
---------------------------------------------------+-------------+---------------
data = "ABAB " * (10 // 2); bytes.fromhex(data)    |  179 ns (*) |         186 ns
data = "ABAB " * (100 // 2); bytes.fromhex(data)   |  659 ns (*) |  340 ns (-48%)
data = "ABAB " * (10**3 // 2); bytes.fromhex(data) | 5.48 us (*) | 2.19 us (-60%)
data = "ABAB " * (10**5 // 2); bytes.fromhex(data) |  529 us (*) |  194 us (-63%)
---------------------------------------------------+-------------+---------------
Total                                              |  536 us (*) |  196 us (-63%)
---------------------------------------------------+-------------+---------------

------------------------------------------+-------------+---------------
with 1 space                              |        orig |          optim
------------------------------------------+-------------+---------------
data = "AB " * 10; bytes.fromhex(data)    |  180 ns (*) |   191 ns (+6%)
data = "AB " * 100; bytes.fromhex(data)   |  710 ns (*) |  330 ns (-54%)
data = "AB " * 10**3; bytes.fromhex(data) | 5.77 us (*) | 1.99 us (-66%)
data = "AB " * 10**5; bytes.fromhex(data) |  559 us (*) |  177 us (-68%)
------------------------------------------+-------------+---------------
Total                                     |  565 us (*) |  179 us (-68%)
------------------------------------------+-------------+---------------

---------------+-------------+--------------
Summary        |        orig |         optim
---------------+-------------+--------------
without spaces |  506 us (*) | 149 us (-70%)
with 0.5 space |  536 us (*) | 196 us (-63%)
with 1 space   |  565 us (*) | 179 us (-68%)
---------------+-------------+--------------
Total          | 1.61 ms (*) | 525 us (-67%)
---------------+-------------+--------------
msg252981 - (view) Author: Roundup Robot (python-dev) Date: 2015-10-14 09:32
New changeset 55d207a637ff by Victor Stinner in branch 'default':
Optimize bytes.fromhex() and bytearray.fromhex()
https://hg.python.org/cpython/rev/55d207a637ff
msg252982 - (view) Author: Roundup Robot (python-dev) Date: 2015-10-14 10:05
New changeset 09e0533f3694 by Victor Stinner in branch 'default':
Issue #25401: Remove now unused hex_digit_to_int() function
https://hg.python.org/cpython/rev/09e0533f3694
History
Date User Action Args
2015-10-14 10:05:53python-devsetmessages: + msg252982
2015-10-14 09:33:09hayposetstatus: open -> closed
resolution: fixed
2015-10-14 09:32:43python-devsetnosy: + python-dev
messages: + msg252981
2015-10-14 09:32:14hayposetfiles: + bench_fromhex.py

messages: + msg252980
2015-10-14 09:09:20haypocreate