This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: bytes.hex(sep, bytes_per_sep) is many times slower than manually inserting the separators
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: Antony.Lee, Dennis Sweeney, gregory.p.smith, miss-islington, vstinner
Priority: normal Keywords: patch

Created on 2020-04-17 20:58 by Antony.Lee, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 19594 merged Dennis Sweeney, 2020-04-19 09:05
Messages (5)
msg366678 - (view) Author: Antony Lee (Antony.Lee) * Date: 2020-04-17 20:58
Consider the following example, linewrapping 10^4 bytes in hex form to 128 characters per line, on Py 3.8.2 (Arch Linux repo package):

    In [1]: import numpy as np, math

    In [2]: data = np.random.randint(0, 256, (100, 100), dtype=np.uint8).tobytes()                  

    In [3]: %timeit data.hex("\n", -64)
    123 µs ± 5.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

    In [4]: %timeit h = data.hex(); "\n".join([h[n * 128 : (n+1) * 128] for n in range(math.ceil(len(h) / 128))])
    45.4 µs ± 746 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

    In [5]: h = data.hex(); "\n".join([h[n * 128 : (n+1) * 128] for n in range(math.ceil(len(h) / 128))]) == data.hex("\n", -64)                                                                       
    Out[5]: True

(the last line checks the validity of the code.)

It appears that a naive manual wrap is nearly 3x faster than the builtin functionality.
msg366761 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python committer) Date: 2020-04-19 07:37
I replicated this behavior. This looks like the relevant loop in pystrhex.c:

    for (i=j=0; i < arglen; ++i) {
        assert((j + 1) < resultlen);
        unsigned char c;
        c = (argbuf[i] >> 4) & 0x0f;
        retbuf[j++] = Py_hexdigits[c];
        c = argbuf[i] & 0x0f;
        retbuf[j++] = Py_hexdigits[c];
        if (bytes_per_sep_group && i < arglen - 1) {
            Py_ssize_t anchor;
            anchor = (bytes_per_sep_group > 0) ? (arglen - 1 - i) : (i + 1);
            if (anchor % abs_bytes_per_sep == 0) {
                retbuf[j++] = sep_char;
            }
        }
    }

It looks like this can be refactored a bit for a tighter inner loop with fewer if-tests. I can work on a PR.
msg366770 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python committer) Date: 2020-04-19 09:25
========== Master ==========

.\python.bat -m pyperf timeit -s "import random, math; data=random.getrandbits(8*10_000_000).to_bytes(10_000_000, 'big')" "temp = data.hex(); '\n'.join(temp[n:n+128] for n in range(0, len(temp), 128))"

Mean +- std dev: 74.3 ms +- 1.1 ms

.\python.bat -m pyperf timeit -s "import random; data=random.getrandbits(8*10_000_000).to_bytes(10_000_000, 'big')" "data.hex('\n', -64)"

Mean +- std dev: 44.0 ms +- 0.3 ms

========== PR 19594 ==========

.\python.bat -m pyperf timeit -s "import random, math; data=random.getrandbits(8*10_000_000).to_bytes(10_000_000, 'big')" "temp = data.hex(); '\n'.join(temp[n:n+128] for n in range(0, len(temp), 128))"

Mean +- std dev: 65.2 ms +- 0.6 ms

.\python.bat -m pyperf timeit -s "import random; data=random.getrandbits(8*10_000_000).to_bytes(10_000_000, 'big')" "data.hex('\n', -64)"

Mean +- std dev: 18.1 ms +- 0.1 ms
msg366903 - (view) Author: miss-islington (miss-islington) Date: 2020-04-21 00:17
New changeset 6a9e80a93148b13e4d3bceaab5ea1804ab0e64d5 by sweeneyde in branch 'master':
bpo-40313: speed up bytes.hex() (GH-19594)
https://github.com/python/cpython/commit/6a9e80a93148b13e4d3bceaab5ea1804ab0e64d5
msg366904 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-21 00:32
Thanks Dennis for the optimization!

FYI I also pushed another optimization recently:

commit 455df9779873b8335b20292b8d0c43d66338a4db
Author: Victor Stinner <vstinner@python.org>
Date:   Wed Apr 15 14:05:24 2020 +0200

    Optimize _Py_strhex_impl() (GH-19535)
    
    Avoid a temporary buffer to create a bytes string: use
    PyBytes_FromStringAndSize() to directly allocate a bytes object.
History
Date User Action Args
2022-04-11 14:59:29adminsetgithub: 84493
2020-04-21 00:32:21vstinnersetstatus: open -> closed

nosy: + vstinner
messages: + msg366904

resolution: fixed
stage: patch review -> resolved
2020-04-21 00:17:59miss-islingtonsetnosy: + miss-islington
messages: + msg366903
2020-04-20 23:53:07gregory.p.smithsetassignee: gregory.p.smith
2020-04-19 13:51:21xtreaksetnosy: + gregory.p.smith
2020-04-19 09:25:45Dennis Sweeneysettype: performance
messages: + msg366770
2020-04-19 09:05:30Dennis Sweeneysetkeywords: + patch
stage: patch review
pull_requests: + pull_request18930
2020-04-19 07:37:59Dennis Sweeneysetnosy: + Dennis Sweeney

messages: + msg366761
versions: + Python 3.9, - Python 3.8
2020-04-17 20:58:29Antony.Leecreate