Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use _PyBytesWriter for bytes%args #69536

Closed
vstinner opened this issue Oct 9, 2015 · 9 comments
Closed

Use _PyBytesWriter for bytes%args #69536

vstinner opened this issue Oct 9, 2015 · 9 comments
Labels
performance Performance or resource usage

Comments

@vstinner
Copy link
Member

vstinner commented Oct 9, 2015

BPO 25349
Nosy @vstinner, @ethanfurman, @serhiy-storchaka
Files
  • bytes_format.patch
  • bench_bytes_format.py
  • bytes_formatlong.patch
  • bench_bytes_int.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-10-09.21:06:38.740>
    created_at = <Date 2015-10-09.00:50:25.364>
    labels = ['performance']
    title = 'Use _PyBytesWriter for bytes%args'
    updated_at = <Date 2016-04-26.10:36:43.249>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2016-04-26.10:36:43.249>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-10-09.21:06:38.740>
    closer = 'vstinner'
    components = []
    creation = <Date 2015-10-09.00:50:25.364>
    creator = 'vstinner'
    dependencies = []
    files = ['40724', '40726', '40732', '40735']
    hgrepos = []
    issue_num = 25349
    keywords = ['patch']
    message_count = 9.0
    messages = ['252577', '252578', '252596', '252597', '252629', '252650', '252655', '252657', '264246']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'ethan.furman', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue25349'
    versions = ['Python 3.6']

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 9, 2015

    Attached patch is a work-in-progress patch to use the new private _PyBytesWriter API in bytes % args.

    The usage of the _PyBytesWriter API will allow further optimization. For example, it avoids the creation of a temporary bytes object to format b'%f' % 1.2.

    The _PyBytesWriter API allocates a small buffer of 512 bytes on the stack to delay the allocation of the final bytes objects. It can avoid the need to call _PyBytes_Resize() completly, or at least reduce the number of calls.

    See also the issue bpo-25318 which added the _PyBytesWriter API.

    @vstinner vstinner added the performance Performance or resource usage label Oct 9, 2015
    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 9, 2015

    See also the PEP-461 "Adding % formatting to bytes and bytearray".

    FYI bytes % args is tested by test_format (good to know to test quickly changes).

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 9, 2015

    bench_bytes_format.py: micro-benchmark testing a few formats. Some tests are focused on the implementation of _PyBytesWriter to ensure that the optimization is efficient.

    Except of a single test (which is not really revelant, it takes less than 500 nanoseconds), all tests are faster.

    The b"xxxxxx %s" % b"y" test confirms that the optimization disabling overallocation for the last write is effective.

    Results:

    Common platform:
    Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
    Python unicode implementation: PEP-393
    CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
    Platform: Linux-4.1.6-200.fc22.x86_64-x86_64-with-fedora-22-Twenty_Two
    CFLAGS: -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
    Timer: time.perf_counter
    Bits: int=32, long=64, long long=64, size_t=64, void*=64

    Platform of campaign orig:
    SCM: hg revision=1aae9b6a6929 tag=tip branch=default date="2015-10-09 01:34 -0400"
    Timer precision: 64 ns
    Python version: 3.6.0a0 (default:1aae9b6a6929, Oct 9 2015, 11:33:56) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
    Date: 2015-10-09 11:34:11

    Platform of campaign writer:
    SCM: hg revision=fc2c11a19ae1+ tag=tip branch=default date="2015-10-09 11:48 +0200"
    Timer precision: 61 ns
    Python version: 3.6.0a0 (default:fc2c11a19ae1+, Oct 9 2015, 12:16:16) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
    Date: 2015-10-09 12:16:31

    ---------------------------+------------+--------------
    use smaller buffer | orig | writer
    ---------------------------+------------+--------------
    b"hello %s" % b"world" | 13 ns () | 12 ns (-5%)
    b"hello %-100s" % b"world" | 158 ns (
    ) | 98 ns (-38%)
    b"x=%d" % 123 | 13 ns () | 12 ns
    b"x=%f" % 1.2 | 13 ns (
    ) | 13 ns
    b"x=%100d" % 123 | 156 ns () | 166 ns (+7%)
    ---------------------------+------------+--------------
    Total | 353 ns (
    ) | 301 ns (-15%)
    ---------------------------+------------+--------------

    -------------------------------------------------+-------------+---------------
    "hello %s" % long_string | orig | writer
    -------------------------------------------------+-------------+---------------

    fmt = b"hello %s"; arg = b"x" * 10; fmt % arg    |   98 ns (*) |   86 ns (-12%)
    fmt = b"hello %s"; arg = b"x" * 100; fmt % arg   |   85 ns (*) |          87 ns
    fmt = b"hello %s"; arg = b"x" * 10**3; fmt % arg |  298 ns (*) |  208 ns (-30%)
    fmt = b"hello %s"; arg = b"x" * 10**5; fmt % arg |  4.8 us (*) |  4.39 us (-9%)
    -------------------------------------------------+-------------+

    Total | 5.28 us (*) | 4.77 us (-10%)
    -------------------------------------------------+-------------+---------------

    ---------------------------------------+-------------+---------------
    b"xxxxxx %s" % b"y" | orig | writer
    ---------------------------------------+-------------+---------------

    fmt = b"x" * 10 + b"%s"; fmt % b"y"    |   99 ns (*) |   81 ns (-18%)
    fmt = b"x" * 100 + b"%s"; fmt % b"y"   |  189 ns (*) |   87 ns (-54%)
    fmt = b"x" * 10**3 + b"%s"; fmt % b"y" | 1.12 us (*) |  209 ns (-81%)
    fmt = b"x" * 10**5 + b"%s"; fmt % b"y" | 88.4 us (*) | 8.49 us (-90%)
    ---------------------------------------+-------------+

    Total | 89.8 us (*) | 8.87 us (-90%)
    ---------------------------------------+-------------+---------------

    ----------------------------------------------------------+-------------+---------------
    %f | orig | writer
    ----------------------------------------------------------+-------------+---------------

    n = 200; fmt = b"%f" * n; arg = tuple([1.2]*n); fmt % arg | 37.2 us (*) | 29.6 us (-21%)
    ----------------------------------------------------------+-------------+

    ------------------------------------------------------------+-------------+---------------
    %i | orig | writer
    ------------------------------------------------------------+-------------+---------------

    n = 200; fmt = b"%f" * n; arg = tuple([12345]*n); fmt % arg | 49.4 us (*) | 42.8 us (-13%)
    ------------------------------------------------------------+-------------+

    -------------------------+-------------+---------------
    Summary | orig | writer
    -------------------------+-------------+---------------
    use smaller buffer | 353 ns () | 301 ns (-15%)
    "hello %s" % long_string | 5.28 us (
    ) | 4.77 us (-10%)
    b"xxxxxx %s" % b"y" | 89.8 us () | 8.87 us (-90%)
    %f | 37.2 us (
    ) | 29.6 us (-21%)
    %i | 49.4 us () | 42.8 us (-13%)
    -------------------------+-------------+---------------
    Total | 182 us (
    ) | 86.3 us (-53%)
    -------------------------+-------------+---------------

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 9, 2015

    New changeset b2f3cbdc0f2d by Victor Stinner in branch 'default':
    Issue bpo-25349: Optimize bytes % args using the new private _PyBytesWriter API
    https://hg.python.org/cpython/rev/b2f3cbdc0f2d

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 9, 2015

    bytes_formatlong.patch: Fast-path for b'%d' % int and other integer formatters. It avoids the creation of a temporary bytes object, it writes directly into the writer, as '%d' % int (Unicode).

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 9, 2015

    I wrote bench_bytes_int.py micro-benchmark, results are below.

    Oh, I did'n expected a real difference even for simple code like b'%d' % 12345 (32% faster). So I consider that it's enough to apply the optimization.

    Common platform:
    Timer: time.perf_counter
    Bits: int=32, long=64, long long=64, size_t=64, void*=64
    CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
    Platform: Linux-4.1.6-200.fc22.x86_64-x86_64-with-fedora-22-Twenty_Two
    Python unicode implementation: PEP-393
    Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
    CFLAGS: -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes

    Platform of campaign orig:
    SCM: hg revision=576128c0d068 tag=tip branch=default date="2015-10-09 10:20 -0400"
    Python version: 3.6.0a0 (default:576128c0d068, Oct 9 2015, 22:36:21) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
    Date: 2015-10-09 22:36:36
    Timer precision: 62 ns

    Platform of campaign writer:
    Python version: 3.6.0a0 (default:576128c0d068+, Oct 9 2015, 22:28:09) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
    Date: 2015-10-09 22:34:53
    SCM: hg revision=576128c0d068+ tag=tip branch=default date="2015-10-09 10:20 -0400"
    Timer precision: 65 ns

    ------------------------------------------------------------+-------------+---------------
    %i                                                          |        orig |         writer
    ------------------------------------------------------------+-------------+---------------

    n = 1; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg   |  155 ns (*) |  105 ns (-32%)
    n = 5; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg   |  546 ns (*) |  306 ns (-44%)
    n = 10; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg  | 1.03 us (*) |  543 ns (-47%)
    n = 25; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg  | 2.49 us (*) | 1.27 us (-49%)
    n = 100; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg | 10.1 us (*) | 5.25 us (-48%)
    n = 200; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg | 20.5 us (*) | 10.8 us (-47%)
    n = 500; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg | 48.8 us (*) | 24.6 us (-50%)
    ------------------------------------------------------------+-------------+

    Total                                                       | 83.6 us (*) | 42.9 us (-49%)
    ------------------------------------------------------------+-------------+---------------

    ---------------------------------------------------------------+-------------+---------------
    x=%i                                                           |        orig |         writer
    ---------------------------------------------------------------+-------------+---------------

    n = 1; fmt = b"x=%d " * n; arg = tuple([12345]*n); fmt % arg   |  173 ns (*) |  123 ns (-29%)
    n = 5; fmt = b"x=%d " * n; arg = tuple([12345]*n); fmt % arg   |  602 ns (*) |  372 ns (-38%)
    n = 10; fmt = b"x=%d " * n; arg = tuple([12345]*n); fmt % arg  | 1.14 us (*) |  668 ns (-42%)
    n = 25; fmt = b"x=%d " * n; arg = tuple([12345]*n); fmt % arg  |  2.8 us (*) | 1.56 us (-44%)
    n = 100; fmt = b"x=%d " * n; arg = tuple([12345]*n); fmt % arg | 11.1 us (*) | 6.12 us (-45%)
    n = 200; fmt = b"x=%d " * n; arg = tuple([12345]*n); fmt % arg | 21.5 us (*) | 12.1 us (-44%)
    n = 500; fmt = b"x=%d " * n; arg = tuple([12345]*n); fmt % arg | 53.5 us (*) | 29.8 us (-44%)
    ---------------------------------------------------------------+-------------+

    Total                                                          | 90.8 us (*) | 50.7 us (-44%)
    ---------------------------------------------------------------+-------------+---------------

    ------------------------------------------------------------+-------------+---------------
    %x                                                          |        orig |         writer
    ------------------------------------------------------------+-------------+---------------

    n = 1; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg   |  155 ns (*) |  105 ns (-32%)
    n = 5; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg   |  545 ns (*) |  306 ns (-44%)
    n = 10; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg  | 1.03 us (*) |  543 ns (-47%)
    n = 25; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg  | 2.49 us (*) | 1.26 us (-49%)
    n = 100; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg |  9.9 us (*) | 5.07 us (-49%)
    n = 200; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg | 19.8 us (*) | 10.1 us (-49%)
    n = 500; fmt = b"%d" * n; arg = tuple([12345]*n); fmt % arg | 48.9 us (*) | 24.5 us (-50%)
    ------------------------------------------------------------+-------------+

    Total                                                       | 82.8 us (*) | 41.9 us (-49%)
    ------------------------------------------------------------+-------------+---------------

    ------------------------------------------------------------------+-------------+---------------
    x=%x                                                              |        orig |         writer
    ------------------------------------------------------------------+-------------+---------------

    n = 1; fmt = b"x=%d " * n; arg = tuple([0xabcdef]*n); fmt % arg   |  183 ns (*) |  132 ns (-28%)
    n = 5; fmt = b"x=%d " * n; arg = tuple([0xabcdef]*n); fmt % arg   |  651 ns (*) |  419 ns (-36%)
    n = 10; fmt = b"x=%d " * n; arg = tuple([0xabcdef]*n); fmt % arg  | 1.23 us (*) |  761 ns (-38%)
    n = 25; fmt = b"x=%d " * n; arg = tuple([0xabcdef]*n); fmt % arg  | 2.96 us (*) | 1.79 us (-40%)
    n = 100; fmt = b"x=%d " * n; arg = tuple([0xabcdef]*n); fmt % arg | 11.9 us (*) | 7.13 us (-40%)
    n = 200; fmt = b"x=%d " * n; arg = tuple([0xabcdef]*n); fmt % arg | 23.5 us (*) |   14 us (-41%)
    n = 500; fmt = b"x=%d " * n; arg = tuple([0xabcdef]*n); fmt % arg | 58.3 us (*) | 34.3 us (-41%)
    ------------------------------------------------------------------+-------------+

    Total                                                             | 98.6 us (*) | 58.5 us (-41%)
    ------------------------------------------------------------------+-------------+---------------

    --------------------------------------------+-------------+--------------
    large int: %i                               |        orig |        writer
    --------------------------------------------+-------------+--------------

    fmt = b"%i"; arg = 10 ** 0 - 1; fmt % arg   |  115 ns (*) |  74 ns (-36%)
    fmt = b"%i"; arg = 10 ** 50 - 1; fmt % arg  |  288 ns (*) | 242 ns (-16%)
    fmt = b"%i"; arg = 10 ** 100 - 1; fmt % arg |  538 ns (*) |  494 ns (-8%)
    fmt = b"%i"; arg = 10 ** 150 - 1; fmt % arg |  865 ns (*) |  812 ns (-6%)
    fmt = b"%i"; arg = 10 ** 200 - 1; fmt % arg | 1.33 us (*) |       1.28 us
    --------------------------------------------+-------------+

    Total                                       | 3.14 us (*) |  2.9 us (-8%)
    --------------------------------------------+-------------+--------------

    ----------------------------------------------+-------------+---------------
    large int: x=%i                               |        orig |         writer
    ----------------------------------------------+-------------+---------------

    fmt = b"x=%i"; arg = 10 ** 0 - 1; fmt % arg   |  140 ns (*) |  100 ns (-28%)
    fmt = b"x=%i"; arg = 10 ** 50 - 1; fmt % arg  |  298 ns (*) |  249 ns (-16%)
    fmt = b"x=%i"; arg = 10 ** 100 - 1; fmt % arg |  548 ns (*) |   502 ns (-8%)
    fmt = b"x=%i"; arg = 10 ** 150 - 1; fmt % arg |  874 ns (*) |   822 ns (-6%)
    ----------------------------------------------+-------------+

    Total                                         | 1.86 us (*) | 1.67 us (-10%)
    ----------------------------------------------+-------------+---------------

    -------------------+-------------+---------------
    Summary            |        orig |         writer
    -------------------+-------------+---------------
    %i                 | 83.6 us () | 42.9 us (-49%)
    x=%i               | 90.8 us (
    ) | 50.7 us (-44%)
    %x                 | 82.8 us () | 41.9 us (-49%)
    x=%x               | 98.6 us (
    ) | 58.5 us (-41%)
    large int: %i      | 3.14 us () |   2.9 us (-8%)
    large int: x=%i    | 1.86 us (
    ) | 1.67 us (-10%)
    -------------------+-------------+---------------
    Total              |  363 us (*) |  201 us (-45%)
    -------------------+-------------+---------------

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 9, 2015

    New changeset d9a89c9137d2 by Victor Stinner in branch 'default':
    Issue bpo-25349: Optimize bytes % int
    https://hg.python.org/cpython/rev/d9a89c9137d2

    New changeset 4d46d1588629 by Victor Stinner in branch 'default':
    Issue bpo-25349: Add fast path for b'%c' % int
    https://hg.python.org/cpython/rev/4d46d1588629

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 9, 2015

    Ok, I implemented all optimizations which were already implemented in str % args. I close the issue.

    @vstinner vstinner closed this as completed Oct 9, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 26, 2016

    New changeset 090502a0c69c by Victor Stinner in branch 'default':
    Issue bpo-25349, bpo-26249: Fix memleak in formatfloat()
    https://hg.python.org/cpython/rev/090502a0c69c

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant