Issue 40313: bytes.hex(sep, bytes_per_sep) is many times slower than manually inserting the separators

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/84493

classification

Title:	bytes.hex(sep, bytes_per_sep) is many times slower than manually inserting the separators
Type:	performance	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.9

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	gregory.p.smith	Nosy List:	Antony.Lee, Dennis Sweeney, gregory.p.smith, miss-islington, vstinner
Priority:	normal	Keywords:	patch

Created on 2020-04-17 20:58 by Antony.Lee, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 19594	merged	Dennis Sweeney, 2020-04-19 09:05

Messages (5)
msg366678 - (view)	Author: Antony Lee (Antony.Lee) *	Date: 2020-04-17 20:58
Consider the following example, linewrapping 10^4 bytes in hex form to 128 characters per line, on Py 3.8.2 (Arch Linux repo package): In [1]: import numpy as np, math In [2]: data = np.random.randint(0, 256, (100, 100), dtype=np.uint8).tobytes() In [3]: %timeit data.hex("\n", -64) 123 µs ± 5.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [4]: %timeit h = data.hex(); "\n".join([h[n * 128 : (n+1) * 128] for n in range(math.ceil(len(h) / 128))]) 45.4 µs ± 746 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [5]: h = data.hex(); "\n".join([h[n * 128 : (n+1) * 128] for n in range(math.ceil(len(h) / 128))]) == data.hex("\n", -64) Out[5]: True (the last line checks the validity of the code.) It appears that a naive manual wrap is nearly 3x faster than the builtin functionality.
msg366761 - (view)	Author: Dennis Sweeney (Dennis Sweeney) *	Date: 2020-04-19 07:37
I replicated this behavior. This looks like the relevant loop in pystrhex.c: for (i=j=0; i < arglen; ++i) { assert((j + 1) < resultlen); unsigned char c; c = (argbuf[i] >> 4) & 0x0f; retbuf[j++] = Py_hexdigits[c]; c = argbuf[i] & 0x0f; retbuf[j++] = Py_hexdigits[c]; if (bytes_per_sep_group && i < arglen - 1) { Py_ssize_t anchor; anchor = (bytes_per_sep_group > 0) ? (arglen - 1 - i) : (i + 1); if (anchor % abs_bytes_per_sep == 0) { retbuf[j++] = sep_char; } } } It looks like this can be refactored a bit for a tighter inner loop with fewer if-tests. I can work on a PR.
msg366770 - (view)	Author: Dennis Sweeney (Dennis Sweeney) *	Date: 2020-04-19 09:25
========== Master ========== .\python.bat -m pyperf timeit -s "import random, math; data=random.getrandbits(810_000_000).to_bytes(10_000_000, 'big')" "temp = data.hex(); '\n'.join(temp[n:n+128] for n in range(0, len(temp), 128))" Mean +- std dev: 74.3 ms +- 1.1 ms .\python.bat -m pyperf timeit -s "import random; data=random.getrandbits(810_000_000).to_bytes(10_000_000, 'big')" "data.hex('\n', -64)" Mean +- std dev: 44.0 ms +- 0.3 ms ========== PR 19594 ========== .\python.bat -m pyperf timeit -s "import random, math; data=random.getrandbits(810_000_000).to_bytes(10_000_000, 'big')" "temp = data.hex(); '\n'.join(temp[n:n+128] for n in range(0, len(temp), 128))" Mean +- std dev: 65.2 ms +- 0.6 ms .\python.bat -m pyperf timeit -s "import random; data=random.getrandbits(810_000_000).to_bytes(10_000_000, 'big')" "data.hex('\n', -64)" Mean +- std dev: 18.1 ms +- 0.1 ms
msg366903 - (view)	Author: miss-islington (miss-islington)	Date: 2020-04-21 00:17
New changeset 6a9e80a93148b13e4d3bceaab5ea1804ab0e64d5 by sweeneyde in branch 'master': bpo-40313: speed up bytes.hex() (GH-19594) https://github.com/python/cpython/commit/6a9e80a93148b13e4d3bceaab5ea1804ab0e64d5
msg366904 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-04-21 00:32
Thanks Dennis for the optimization! FYI I also pushed another optimization recently: commit 455df9779873b8335b20292b8d0c43d66338a4db Author: Victor Stinner <vstinner@python.org> Date: Wed Apr 15 14:05:24 2020 +0200 Optimize _Py_strhex_impl() (GH-19535) Avoid a temporary buffer to create a bytes string: use PyBytes_FromStringAndSize() to directly allocate a bytes object.

History
Date	User	Action	Args
2022-04-11 14:59:29	admin	set	github: 84493
2020-04-21 00:32:21	vstinner	set	status: open -> closed nosy: + vstinner messages: + msg366904 resolution: fixed stage: patch review -> resolved
2020-04-21 00:17:59	miss-islington	set	nosy: + miss-islington messages: + msg366903
2020-04-20 23:53:07	gregory.p.smith	set	assignee: gregory.p.smith
2020-04-19 13:51:21	xtreak	set	nosy: + gregory.p.smith
2020-04-19 09:25:45	Dennis Sweeney	set	type: performance messages: + msg366770
2020-04-19 09:05:30	Dennis Sweeney	set	keywords: + patch stage: patch review pull_requests: + pull_request18930
2020-04-19 07:37:59	Dennis Sweeney	set	nosy: + Dennis Sweeney messages: + msg366761 versions: + Python 3.9, - Python 3.8
2020-04-17 20:58:29	Antony.Lee	create