Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite StringIO to use the _PyUnicodeWriter API #59817

Closed
vstinner opened this issue Aug 10, 2012 · 12 comments
Closed

Rewrite StringIO to use the _PyUnicodeWriter API #59817

vstinner opened this issue Aug 10, 2012 · 12 comments
Labels
performance Performance or resource usage topic-IO topic-unicode

Comments

@vstinner
Copy link
Member

BPO 15612
Nosy @pitrou, @vstinner, @ezio-melotti, @serhiy-storchaka
Files
  • stringio_unicode_writer.patch
  • bench_stringio.py
  • bench_stringio2.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-03-18.11:04:56.223>
    created_at = <Date 2012-08-10.02:30:29.691>
    labels = ['expert-unicode', 'expert-IO', 'performance']
    title = 'Rewrite StringIO to use the _PyUnicodeWriter API'
    updated_at = <Date 2015-03-18.11:04:56.221>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2015-03-18.11:04:56.221>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-03-18.11:04:56.223>
    closer = 'vstinner'
    components = ['Unicode', 'IO']
    creation = <Date 2012-08-10.02:30:29.691>
    creator = 'vstinner'
    dependencies = []
    files = ['26752', '26753', '26765']
    hgrepos = []
    issue_num = 15612
    keywords = ['patch']
    message_count = 12.0
    messages = ['167850', '167851', '167857', '167858', '167926', '167927', '167950', '167974', '167975', '167977', '167978', '238415']
    nosy_count = 5.0
    nosy_names = ['pitrou', 'vstinner', 'ezio.melotti', 'Arfrever', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'out of date'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue15612'
    versions = ['Python 3.4']

    @vstinner
    Copy link
    Member Author

    Attached patch rewrites the C implementation of StringIO to use the _PyUnicodeWriter API instead of the PyAccu API. It provides better performance when writing non-ASCII strings.

    The patch adds new functions:

    • _PyUnicodeWriter_Truncate()
    • _PyUnicodeWriter_WriteStrAt()
    • _PyUnicodeWriter_GetValue()

    @vstinner
    Copy link
    Member Author

    Results of my micro benchmark. Use attached bench_stringio.py with benchmark.py:
    https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py

    Command:
    ./python benchmark.py script bench_stringio.py

    ----

    Common platform:
    CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
    Python unicode implementation: PEP-393
    Platform: Linux-3.4.4-4.fc16.x86_64-x86_64-with-fedora-16-Verne
    Bits: int=32, long=64, long long=64, pointer=64
    CFLAGS: -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes

    Platform of campaign pyaccu:
    Date: 2012-08-10 04:24:53
    SCM: hg revision=aaa68dce117e tag=tip branch=default date="2012-08-09 21:38 +0200"
    Python version: 3.3.0b1 (default:aaa68dce117e, Aug 10 2012, 04:24:19) [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)]

    Platform of campaign writer:
    Date: 2012-08-10 04:23:21
    SCM: hg revision=aaa68dce117e+ tag=tip branch=default date="2012-08-09 21:38 +0200"
    Python version: 3.3.0b1 (default:aaa68dce117e+, Aug 10 2012, 04:18:39) [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)]

    --------------------------------------+-------------+---------------
    Tests | pyaccu | writer
    --------------------------------------+-------------+---------------
    writer ascii | 30.4 ms () | 30.4 ms
    writer reader ascii | 37.1 ms (
    ) | 37 ms
    writer latin1 | 31.5 ms () | 30.6 ms
    writer reader latin1 | 38.6 ms (
    ) | 37.4 ms
    writer bmp | 31.8 ms () | 29.7 ms (-7%)
    writer reader bmp | 40.8 ms (
    ) | 36.6 ms (-10%)
    writer non-bmp | 33.4 ms () | 30.2 ms (-10%)
    writer reader non-bmp | 40.9 ms (
    ) | 36.7 ms (-10%)
    writer long lines ascii | 7.96 ms () | 7.34 ms (-8%)
    writer-reader long lines ascii | 8.16 ms (
    ) | 7.39 ms (-9%)
    writer long lines latin1 | 8.01 ms () | 7.4 ms (-8%)
    writer-reader long lines latin1 | 8.05 ms (
    ) | 7.4 ms (-8%)
    writer long lines bmp | 14 ms () | 9.42 ms (-33%)
    writer-reader long lines bmp | 14.2 ms (
    ) | 9.45 ms (-34%)
    writer long lines non-bmp | 13.9 ms () | 9.62 ms (-31%)
    writer-reader long lines non-bmp | 14.3 ms (
    ) | 9.63 ms (-32%)
    writer very long lines ascii | 7.96 ms () | 7.36 ms (-7%)
    writer-reader very long lines ascii | 8.05 ms (
    ) | 7.37 ms (-8%)
    writer very long lines latin1 | 7.98 ms () | 7.33 ms (-8%)
    writer-reader very long lines latin1 | 8 ms (
    ) | 7.39 ms (-8%)
    writer very long lines bmp | 14.1 ms () | 9.34 ms (-34%)
    writer-reader very long lines bmp | 14.2 ms (
    ) | 9.4 ms (-34%)
    writer very long lines non-bmp | 13.9 ms () | 9.5 ms (-32%)
    writer-reader very long lines non-bmp | 14 ms (
    ) | 9.61 ms (-31%)
    reader ascii | 6.48 ms () | 6.22 ms
    reader latin1 | 6.59 ms (
    ) | 6.57 ms
    reader bmp | 7.22 ms () | 6.9 ms
    reader non-bmp | 7.65 ms (
    ) | 7.31 ms
    --------------------------------------+-------------+---------------
    Total | 489 ms (*) | 431 ms (-12%)
    --------------------------------------+-------------+---------------

    @vstinner vstinner added the performance Performance or resource usage label Aug 10, 2012
    @vstinner vstinner changed the title Rewriter StringIO to use the _PyUnicodeWriter API Rewrite StringIO to use the _PyUnicodeWriter API Aug 10, 2012
    @pitrou
    Copy link
    Member

    pitrou commented Aug 10, 2012

    It provides better performance when writing non-ASCII strings.

    I would like to know why that is the case. If PyUnicode_Join is not optimal, then perhaps we should better optimize it.

    Also, you should post benchmarks with tiny strings as well.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 10, 2012

    Also, you should post benchmarks with tiny strings as well.

    Oops, sorry, they are already there. Thanks for the numbers.

    @vstinner
    Copy link
    Member Author

    I would like to know why that is the case.
    If PyUnicode_Join is not optimal, then perhaps we should
    better optimize it.

    I don't know. _PyUnicodeWriter overallocates its buffer (+25%). It may reduce the number of realloc(), and so the number of times that the buffer is copied.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 10, 2012

    > I would like to know why that is the case.
    > If PyUnicode_Join is not optimal, then perhaps we should
    > better optimize it.

    I don't know. _PyUnicodeWriter overallocates its buffer (+25%). It may
    reduce the number of realloc(), and so the number of times that the
    buffer is copied.

    But PyUnicode_Join doesn't realloc() anything, since it creates a buffer
    of exactly the right size. So this can't be the answer.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 11, 2012

    Victor, your benchmark is buggy (it writes one character at a time). You should apply the following patch:

    $ diff -u bench_stringio_orig.py bench_stringio.py 
    --- bench_stringio_orig.py	2012-08-11 12:02:16.528321958 +0200
    +++ bench_stringio.py	2012-08-11 12:05:53.939536902 +0200
    @@ -41,8 +41,8 @@
             ('bmp', '\u20ac' * k + '\n'),
             ('non-bmp', '\U0010ffff' * k + '\n'),
         ):
    -        bench.bench_func('writer long lines %s' % charset, writer, n // k, text)
    -        bench.bench_func('writer-reader long lines %s' % charset, writer_reader, n // k, text)
    +        bench.bench_func('writer long lines %s' % charset, writer, n, [text])
    +        bench.bench_func('writer-reader long lines %s' % charset, writer_reader, n, [text])
     
         for charset, text in (
             ('ascii', 'a' * (n // 10) + '\n'),
    @@ -50,8 +50,8 @@
             ('bmp', '\u20ac' * (n // 10) + '\n'),
             ('non-bmp', '\U0010ffff' * (n // 10) + '\n'),
         ):
    -        bench.bench_func('writer very long lines %s' % charset, writer, 10, text)
    -        bench.bench_func('writer-reader very long lines %s' % charset, writer_reader, 10, text)
    +        bench.bench_func('writer very long lines %s' % charset, writer, 100, [text])
    +        bench.bench_func('writer-reader very long lines %s' % charset, writer_reader, 100, [text])
     
         data = 'abc\n' * n
         bench.bench_func('reader ascii', reader, data)

    @vstinner
    Copy link
    Member Author

    Victor, your benchmark is buggy (it writes one character at a time).

    Oh, it's not what I wanted to test.

    I attach a new benchmark. Here are the results. PyAccu looks much more appropriate than _PyUnicodeWriter, because it is always faster, except to write 100.000 very long lines.

    Common platform:
    CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
    CFLAGS: -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
    Bits: int=32, long=64, long long=64, pointer=64
    Python unicode implementation: PEP-393
    Platform: Linux-3.4.4-4.fc16.x86_64-x86_64-with-fedora-16-Verne

    Platform of campaign pyaccu:
    SCM: hg revision=9804aec74d4a tag=tip branch=default date="2012-08-10 18:55 -0400"
    Date: 2012-08-11 16:53:46
    Python version: 3.3.0b1 (default:9804aec74d4a, Aug 11 2012, 16:53:12) [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)]

    Platform of campaign writer:
    SCM: hg revision=9804aec74d4a+ tag=tip branch=default date="2012-08-10 18:55 -0400"
    Date: 2012-08-11 16:50:40
    Python version: 3.3.0b1 (default:9804aec74d4a+, Aug 11 2012, 16:33:18) [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)]

    --------------------------------------+-------------+---------------
    10 lines | pyaccu | writer
    --------------------------------------+-------------+---------------
    reader short line ascii | 1.53 us () | 1.46 us
    writer short line ascii | 4.85 us (
    ) | 4.48 us (-8%)
    writer-reader short line ascii | 6.45 us () | 5.71 us (-12%)
    reader short line latin1 | 1.57 us (
    ) | 1.45 us (-8%)
    writer short line latin1 | 4.92 us () | 4.56 us (-7%)
    writer-reader short line latin1 | 6.6 us (
    ) | 5.78 us (-13%)
    reader short line bmp | 1.64 us () | 1.54 us (-6%)
    writer short line bmp | 5.01 us (
    ) | 4.43 us (-12%)
    writer-reader short line bmp | 6.68 us () | 5.71 us (-14%)
    reader short line non-bmp | 1.61 us (
    ) | 1.59 us
    writer short line non-bmp | 5.1 us () | 4.55 us (-11%)
    writer-reader short line non-bmp | 6.74 us (
    ) | 5.66 us (-16%)
    reader long lines ascii | 103 us () | 33.4 us (-68%)
    writer long lines ascii | 998 ns (
    ) | 836 ns (-16%)
    writer-reader long lines ascii | 1.45 us () | 1.18 us (-19%)
    reader long lines latin1 | 105 us (
    ) | 34.2 us (-67%)
    writer long lines latin1 | 997 ns () | 831 ns (-17%)
    writer-reader long lines latin1 | 1.47 us (
    ) | 1.2 us (-18%)
    reader long lines bmp | 121 us () | 85.9 us (-29%)
    writer long lines bmp | 995 ns (
    ) | 861 ns (-13%)
    writer-reader long lines bmp | 1.43 us () | 1.13 us (-21%)
    reader long lines non-bmp | 97.1 us (
    ) | 99.7 us
    writer long lines non-bmp | 1 us () | 819 ns (-18%)
    writer-reader long lines non-bmp | 1.4 us (
    ) | 1.18 us (-16%)
    reader very long lines ascii | 1.42 us () | 1.45 us
    writer very long lines ascii | 3.04 us (
    ) | 2.88 us (-5%)
    writer-reader very long lines ascii | 4.59 us () | 4.12 us (-10%)
    reader very long lines latin1 | 1.57 us (
    ) | 1.47 us (-7%)
    writer very long lines latin1 | 3.04 us () | 2.73 us (-10%)
    writer-reader very long lines latin1 | 4.66 us (
    ) | 4.04 us (-13%)
    reader very long lines bmp | 1.55 us () | 1.55 us
    writer very long lines bmp | 3.03 us (
    ) | 2.91 us
    writer-reader very long lines bmp | 4.72 us () | 4.08 us (-14%)
    reader very long lines non-bmp | 1.55 us (
    ) | 1.49 us
    writer very long lines non-bmp | 3.09 us () | 2.93 us (-5%)
    writer-reader very long lines non-bmp | 4.59 us (
    ) | 4.06 us (-12%)
    --------------------------------------+-------------+---------------
    Total | 525 us (*) | 342 us (-35%)
    --------------------------------------+-------------+---------------

    --------------------------------------+-------------+---------------
    1000 lines | pyaccu | writer
    --------------------------------------+-------------+---------------
    reader short line ascii | 68.2 us () | 66.1 us
    writer short line ascii | 308 us (
    ) | 307 us
    writer-reader short line ascii | 378 us () | 374 us
    reader short line latin1 | 72 us (
    ) | 68.5 us
    writer short line latin1 | 324 us () | 313 us
    writer-reader short line latin1 | 395 us (
    ) | 383 us
    reader short line bmp | 74.8 us () | 71.9 us
    writer short line bmp | 326 us (
    ) | 303 us (-7%)
    writer-reader short line bmp | 397 us () | 378 us
    reader short line non-bmp | 72.9 us (
    ) | 72.6 us
    writer short line non-bmp | 329 us () | 304 us (-8%)
    writer-reader short line non-bmp | 397 us (
    ) | 383 us
    reader long lines ascii | 104 us () | 33.8 us (-67%)
    writer long lines ascii | 1.99 us (
    ) | 2.52 us (+27%)
    writer-reader long lines ascii | 4.37 us () | 3.45 us (-21%)
    reader long lines latin1 | 104 us (
    ) | 33.3 us (-68%)
    writer long lines latin1 | 2.07 us () | 2.55 us (+23%)
    writer-reader long lines latin1 | 4.51 us (
    ) | 3.57 us (-21%)
    reader long lines bmp | 120 us () | 80.5 us (-33%)
    writer long lines bmp | 2.15 us (
    ) | 2.55 us (+18%)
    writer-reader long lines bmp | 4.71 us () | 3.86 us (-18%)
    reader long lines non-bmp | 90.6 us (
    ) | 97.6 us (+8%)
    writer long lines non-bmp | 2.18 us () | 2.68 us (+23%)
    writer-reader long lines non-bmp | 4.24 us (
    ) | 4.05 us
    reader very long lines ascii | 2.53 us () | 1.66 us (-34%)
    writer very long lines ascii | 3.07 us (
    ) | 3.46 us (+13%)
    writer-reader very long lines ascii | 6.18 us () | 4.89 us (-21%)
    reader very long lines latin1 | 2.57 us (
    ) | 1.75 us (-32%)
    writer very long lines latin1 | 3.16 us () | 3.46 us (+10%)
    writer-reader very long lines latin1 | 6.32 us (
    ) | 4.98 us (-21%)
    reader very long lines bmp | 2.7 us () | 2.34 us (-14%)
    writer very long lines bmp | 3.52 us (
    ) | 3.65 us
    writer-reader very long lines bmp | 6.73 us () | 5.7 us (-15%)
    reader very long lines non-bmp | 2.45 us (
    ) | 2.35 us
    writer very long lines non-bmp | 3.47 us () | 3.87 us (+12%)
    writer-reader very long lines non-bmp | 5.98 us (
    ) | 5.85 us
    --------------------------------------+-------------+---------------
    Total | 3.63 ms (*) | 3.34 ms (-8%)
    --------------------------------------+-------------+---------------

    --------------------------------------+-------------+---------------
    100000 lines | pyaccu | writer
    --------------------------------------+-------------+---------------
    reader short line ascii | 6.74 ms () | 6.43 ms
    writer short line ascii | 30.7 ms (
    ) | 29.8 ms
    writer-reader short line ascii | 37.5 ms () | 36.6 ms
    reader short line latin1 | 7.08 ms (
    ) | 6.64 ms (-6%)
    writer short line latin1 | 31.3 ms () | 30.1 ms
    writer-reader short line latin1 | 38.8 ms (
    ) | 37.5 ms
    reader short line bmp | 7.46 ms () | 6.98 ms (-6%)
    writer short line bmp | 32 ms (
    ) | 29 ms (-9%)
    writer-reader short line bmp | 40.5 ms () | 35.9 ms (-11%)
    reader short line non-bmp | 7.36 ms (
    ) | 7.23 ms
    writer short line non-bmp | 33.3 ms () | 29.4 ms (-12%)
    writer-reader short line non-bmp | 40.5 ms (
    ) | 36.5 ms (-10%)
    reader long lines ascii | 103 us () | 32.6 us (-68%)
    writer long lines ascii | 59.4 us (
    ) | 66.5 us (+12%)
    writer-reader long lines ascii | 220 us () | 99.2 us (-55%)
    reader long lines latin1 | 105 us (
    ) | 32.2 us (-69%)
    writer long lines latin1 | 60.2 us () | 67.3 us (+12%)
    writer-reader long lines latin1 | 240 us (
    ) | 97.6 us (-59%)
    reader long lines bmp | 122 us () | 76.9 us (-37%)
    writer long lines bmp | 62.1 us (
    ) | 73.8 us (+19%)
    writer-reader long lines bmp | 242 us () | 151 us (-38%)
    reader long lines non-bmp | 95.7 us (
    ) | 92.1 us
    writer long lines non-bmp | 76.5 us () | 90.3 us (+18%)
    writer-reader long lines non-bmp | 198 us (
    ) | 173 us (-12%)
    reader very long lines ascii | 91.6 us () | 11.5 us (-87%)
    writer very long lines ascii | 7.15 us (
    ) | 11.9 us (+67%)
    writer-reader very long lines ascii | 145 us () | 20.1 us (-86%)
    reader very long lines latin1 | 110 us (
    ) | 12 us (-89%)
    writer very long lines latin1 | 7.52 us () | 12.1 us (+61%)
    writer-reader very long lines latin1 | 165 us (
    ) | 20.7 us (-87%)
    reader very long lines bmp | 91.1 us () | 46.7 us (-49%)
    writer very long lines bmp | 12.3 us (
    ) | 22.5 us (+82%)
    writer-reader very long lines bmp | 150 us () | 61.9 us (-59%)
    reader very long lines non-bmp | 66.8 us (
    ) | 66.6 us
    writer very long lines non-bmp | 22.4 us () | 38.4 us (+72%)
    writer-reader very long lines non-bmp | 108 us (
    ) | 87.7 us (-19%)
    --------------------------------------+-------------+---------------
    Total | 316 ms (*) | 294 ms (-7%)
    --------------------------------------+-------------+---------------

    -------------+-------------+--------------
    Summary | pyaccu | writer
    -------------+-------------+--------------
    10 lines | 525 us () | 342 us (-35%)
    1000 lines | 3.63 ms (
    ) | 3.34 ms (-8%)
    100000 lines | 316 ms () | 294 ms (-7%)
    -------------+-------------+--------------
    Total | 320 ms (
    ) | 297 ms (-7%)
    -------------+-------------+--------------

    @vstinner
    Copy link
    Member Author

    "PyAccu looks much more appropriate than _PyUnicodeWriter, because it is always faster, except to write 100.000 very long lines."

    Oh... I added colors to my tool, but there was a bug: I used the wrong colors... It's just the opposite.

    _PyUnicodeWriter is almost always faster, except to write more than 100.000 very long lines.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 11, 2012

    _PyUnicodeWriter is almost always faster

    Actually, PyAccu is consistently faster for the "writer" case, while _PyUnicodeWriter is faster for the "writer-reader" case.
    This is not because of PyAccu, but because of the way StringIO uses it: when e.g. readline() is called, the PyAccu result is converted into a PyUCS4* buffer, then each readline() result is converted again by finding the max char in the sub-buffer.

    So I would suggest using PyAccu, but converting its result to a _PyUnicodeWriter rather than a PyUCS4* buffer.

    @serhiy-storchaka
    Copy link
    Member

    See benchmark results in bpo-15381 (the patch is not applicable to StringIO). These numbers show that resize strategy can be much slower append/join strategy on Windows.

    @vstinner
    Copy link
    Member Author

    I'm no more interested to work on this issue, and it's not clear that _PyUnicodeWriter is always faster. Switch from a list to _PyUnicodeWriter on a specific event would make the code much more complex. I prefer to just close the issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage topic-IO topic-unicode
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants