Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize bytes.fromhex() and bytearray.fromhex() #69587

Closed
vstinner opened this issue Oct 14, 2015 · 4 comments
Closed

Optimize bytes.fromhex() and bytearray.fromhex() #69587

vstinner opened this issue Oct 14, 2015 · 4 comments
Labels
performance Performance or resource usage

Comments

@vstinner
Copy link
Member

BPO 25401
Nosy @vstinner
Files
  • fromhex.patch
  • bench_fromhex.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-10-14.09:33:09.429>
    created_at = <Date 2015-10-14.09:09:20.369>
    labels = ['performance']
    title = 'Optimize bytes.fromhex() and bytearray.fromhex()'
    updated_at = <Date 2015-10-14.10:05:53.292>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2015-10-14.10:05:53.292>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-10-14.09:33:09.429>
    closer = 'vstinner'
    components = []
    creation = <Date 2015-10-14.09:09:20.369>
    creator = 'vstinner'
    dependencies = []
    files = ['40779', '40780']
    hgrepos = []
    issue_num = 25401
    keywords = ['patch']
    message_count = 4.0
    messages = ['252979', '252980', '252981', '252982']
    nosy_count = 2.0
    nosy_names = ['vstinner', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue25401'
    versions = ['Python 3.6']

    @vstinner
    Copy link
    Member Author

    Attached patch optimizes bytes.fromhex() and bytearray.fromhex():

    • Fast-path working on a char* string for ASCII string
    • Slow-path for non-ASCII string
    • Replace slow hex_digit_to_int() function with a O(1) lookup in _PyLong_DigitValue precomputed table
    • Use _PyBytesWriter API to handle the buffer
    • Check the error position in error messages

    @vstinner vstinner added the performance Performance or resource usage label Oct 14, 2015
    @vstinner
    Copy link
    Member Author

    It's between 2 and 3.5x faster.

    It's 9% slower on short string (10 bytes for the output), but I consider that the speedup is more interesting than the slowdown on short strings.

    RMicrobenchmark:

    Common platform:
    Platform: Linux-4.1.6-200.fc22.x86_64-x86_64-with-fedora-22-Twenty_Two
    Timer: time.perf_counter
    CFLAGS: -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
    Python unicode implementation: PEP-393
    Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
    CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
    Bits: int=32, long=64, long long=64, size_t=64, void*=64

    Platform of campaign orig:
    SCM: hg revision=90e41d965228 tag=tip branch=default date="2015-10-14 10:10 +0200"
    Python version: 3.6.0a0 (default:90e41d965228, Oct 14 2015, 10:46:50) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
    Date: 2015-10-14 10:47:05
    Timer precision: 54 ns

    Platform of campaign optim:
    SCM: hg revision=90e41d965228+ tag=tip branch=default date="2015-10-14 10:10 +0200"
    Python version: 3.6.0a0 (default:90e41d965228+, Oct 14 2015, 11:07:24) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
    Date: 2015-10-14 11:09:53
    Timer precision: 62 ns

    -----------------------------------------+-------------+---------------
    without spaces | orig | optim
    -----------------------------------------+-------------+---------------

    data = "AB" * 10; bytes.fromhex(data)    |  167 ns (*) |   181 ns (+9%)
    data = "AB" * 100; bytes.fromhex(data)   |  621 ns (*) |  295 ns (-52%)
    data = "AB" * 10**3; bytes.fromhex(data) | 5.15 us (*) | 1.65 us (-68%)
    data = "AB" * 10**5; bytes.fromhex(data) |  500 us (*) |  147 us (-71%)
    -----------------------------------------+-------------+

    Total | 506 us (*) | 149 us (-70%)
    -----------------------------------------+-------------+---------------

    ---------------------------------------------------+-------------+---------------
    with 0.5 space | orig | optim
    ---------------------------------------------------+-------------+---------------

    data = "ABAB " * (10 // 2); bytes.fromhex(data)    |  179 ns (*) |         186 ns
    data = "ABAB " * (100 // 2); bytes.fromhex(data)   |  659 ns (*) |  340 ns (-48%)
    data = "ABAB " * (10**3 // 2); bytes.fromhex(data) | 5.48 us (*) | 2.19 us (-60%)
    data = "ABAB " * (10**5 // 2); bytes.fromhex(data) |  529 us (*) |  194 us (-63%)
    ---------------------------------------------------+-------------+

    Total | 536 us (*) | 196 us (-63%)
    ---------------------------------------------------+-------------+---------------

    ------------------------------------------+-------------+---------------
    with 1 space | orig | optim
    ------------------------------------------+-------------+---------------

    data = "AB " * 10; bytes.fromhex(data)    |  180 ns (*) |   191 ns (+6%)
    data = "AB " * 100; bytes.fromhex(data)   |  710 ns (*) |  330 ns (-54%)
    data = "AB " * 10**3; bytes.fromhex(data) | 5.77 us (*) | 1.99 us (-66%)
    data = "AB " * 10**5; bytes.fromhex(data) |  559 us (*) |  177 us (-68%)
    ------------------------------------------+-------------+

    Total | 565 us (*) | 179 us (-68%)
    ------------------------------------------+-------------+---------------

    ---------------+-------------+--------------
    Summary | orig | optim
    ---------------+-------------+--------------
    without spaces | 506 us () | 149 us (-70%)
    with 0.5 space | 536 us (
    ) | 196 us (-63%)
    with 1 space | 565 us () | 179 us (-68%)
    ---------------+-------------+--------------
    Total | 1.61 ms (
    ) | 525 us (-67%)
    ---------------+-------------+--------------

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 14, 2015

    New changeset 55d207a637ff by Victor Stinner in branch 'default':
    Optimize bytes.fromhex() and bytearray.fromhex()
    https://hg.python.org/cpython/rev/55d207a637ff

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 14, 2015

    New changeset 09e0533f3694 by Victor Stinner in branch 'default':
    Issue bpo-25401: Remove now unused hex_digit_to_int() function
    https://hg.python.org/cpython/rev/09e0533f3694

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant