Rewrite StringIO to use the _PyUnicodeWriter API #59817

vstinner · 2012-08-10T02:30:30Z

BPO	15612
Nosy	@pitrou, @vstinner, @ezio-melotti, @serhiy-storchaka
Files	stringio_unicode_writer.patch bench_stringio.py bench_stringio2.py

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2015-03-18.11:04:56.223>
created_at = <Date 2012-08-10.02:30:29.691>
labels = ['expert-unicode', 'expert-IO', 'performance']
title = 'Rewrite StringIO to use the _PyUnicodeWriter API'
updated_at = <Date 2015-03-18.11:04:56.221>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2015-03-18.11:04:56.221>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2015-03-18.11:04:56.223>
closer = 'vstinner'
components = ['Unicode', 'IO']
creation = <Date 2012-08-10.02:30:29.691>
creator = 'vstinner'
dependencies = []
files = ['26752', '26753', '26765']
hgrepos = []
issue_num = 15612
keywords = ['patch']
message_count = 12.0
messages = ['167850', '167851', '167857', '167858', '167926', '167927', '167950', '167974', '167975', '167977', '167978', '238415']
nosy_count = 5.0
nosy_names = ['pitrou', 'vstinner', 'ezio.melotti', 'Arfrever', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'out of date'
stage = None
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue15612'
versions = ['Python 3.4']

vstinner · 2012-08-10T02:30:22Z

Attached patch rewrites the C implementation of StringIO to use the _PyUnicodeWriter API instead of the PyAccu API. It provides better performance when writing non-ASCII strings.

The patch adds new functions:

_PyUnicodeWriter_Truncate()
_PyUnicodeWriter_WriteStrAt()
_PyUnicodeWriter_GetValue()

vstinner · 2012-08-10T02:32:18Z

Results of my micro benchmark. Use attached bench_stringio.py with benchmark.py:
https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py

Command:
./python benchmark.py script bench_stringio.py

----

Common platform:
CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Python unicode implementation: PEP-393
Platform: Linux-3.4.4-4.fc16.x86_64-x86_64-with-fedora-16-Verne
Bits: int=32, long=64, long long=64, pointer=64
CFLAGS: -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes

Platform of campaign pyaccu:
Date: 2012-08-10 04:24:53
SCM: hg revision=aaa68dce117e tag=tip branch=default date="2012-08-09 21:38 +0200"
Python version: 3.3.0b1 (default:aaa68dce117e, Aug 10 2012, 04:24:19) [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)]

Platform of campaign writer:
Date: 2012-08-10 04:23:21
SCM: hg revision=aaa68dce117e+ tag=tip branch=default date="2012-08-09 21:38 +0200"
Python version: 3.3.0b1 (default:aaa68dce117e+, Aug 10 2012, 04:18:39) [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)]

--------------------------------------+-------------+---------------
Tests | pyaccu | writer
--------------------------------------+-------------+---------------
writer ascii | 30.4 ms () | 30.4 ms
writer reader ascii | 37.1 ms () | 37 ms
writer latin1 | 31.5 ms () | 30.6 ms
writer reader latin1 | 38.6 ms () | 37.4 ms
writer bmp | 31.8 ms () | 29.7 ms (-7%)
writer reader bmp | 40.8 ms () | 36.6 ms (-10%)
writer non-bmp | 33.4 ms () | 30.2 ms (-10%)
writer reader non-bmp | 40.9 ms () | 36.7 ms (-10%)
writer long lines ascii | 7.96 ms () | 7.34 ms (-8%)
writer-reader long lines ascii | 8.16 ms () | 7.39 ms (-9%)
writer long lines latin1 | 8.01 ms () | 7.4 ms (-8%)
writer-reader long lines latin1 | 8.05 ms () | 7.4 ms (-8%)
writer long lines bmp | 14 ms () | 9.42 ms (-33%)
writer-reader long lines bmp | 14.2 ms () | 9.45 ms (-34%)
writer long lines non-bmp | 13.9 ms () | 9.62 ms (-31%)
writer-reader long lines non-bmp | 14.3 ms () | 9.63 ms (-32%)
writer very long lines ascii | 7.96 ms () | 7.36 ms (-7%)
writer-reader very long lines ascii | 8.05 ms () | 7.37 ms (-8%)
writer very long lines latin1 | 7.98 ms () | 7.33 ms (-8%)
writer-reader very long lines latin1 | 8 ms () | 7.39 ms (-8%)
writer very long lines bmp | 14.1 ms () | 9.34 ms (-34%)
writer-reader very long lines bmp | 14.2 ms () | 9.4 ms (-34%)
writer very long lines non-bmp | 13.9 ms () | 9.5 ms (-32%)
writer-reader very long lines non-bmp | 14 ms () | 9.61 ms (-31%)
reader ascii | 6.48 ms () | 6.22 ms
reader latin1 | 6.59 ms () | 6.57 ms
reader bmp | 7.22 ms () | 6.9 ms
reader non-bmp | 7.65 ms () | 7.31 ms
--------------------------------------+-------------+---------------
Total | 489 ms (*) | 431 ms (-12%)
--------------------------------------+-------------+---------------

pitrou · 2012-08-10T08:10:01Z

It provides better performance when writing non-ASCII strings.

I would like to know why that is the case. If PyUnicode_Join is not optimal, then perhaps we should better optimize it.

Also, you should post benchmarks with tiny strings as well.

pitrou · 2012-08-10T08:12:08Z

Also, you should post benchmarks with tiny strings as well.

Oops, sorry, they are already there. Thanks for the numbers.

vstinner · 2012-08-10T23:12:28Z

I would like to know why that is the case.
If PyUnicode_Join is not optimal, then perhaps we should
better optimize it.

I don't know. _PyUnicodeWriter overallocates its buffer (+25%). It may reduce the number of realloc(), and so the number of times that the buffer is copied.

pitrou · 2012-08-10T23:20:23Z

> I would like to know why that is the case.
> If PyUnicode_Join is not optimal, then perhaps we should
> better optimize it.

I don't know. _PyUnicodeWriter overallocates its buffer (+25%). It may
reduce the number of realloc(), and so the number of times that the
buffer is copied.

But PyUnicode_Join doesn't realloc() anything, since it creates a buffer
of exactly the right size. So this can't be the answer.

pitrou · 2012-08-11T10:10:55Z

Victor, your benchmark is buggy (it writes one character at a time). You should apply the following patch:

$ diff -u bench_stringio_orig.py bench_stringio.py 
--- bench_stringio_orig.py	2012-08-11 12:02:16.528321958 +0200
+++ bench_stringio.py	2012-08-11 12:05:53.939536902 +0200
@@ -41,8 +41,8 @@
         ('bmp', '\u20ac' * k + '\n'),
         ('non-bmp', '\U0010ffff' * k + '\n'),
     ):
-        bench.bench_func('writer long lines %s' % charset, writer, n // k, text)
-        bench.bench_func('writer-reader long lines %s' % charset, writer_reader, n // k, text)
+        bench.bench_func('writer long lines %s' % charset, writer, n, [text])
+        bench.bench_func('writer-reader long lines %s' % charset, writer_reader, n, [text])
 
     for charset, text in (
         ('ascii', 'a' * (n // 10) + '\n'),
@@ -50,8 +50,8 @@
         ('bmp', '\u20ac' * (n // 10) + '\n'),
         ('non-bmp', '\U0010ffff' * (n // 10) + '\n'),
     ):
-        bench.bench_func('writer very long lines %s' % charset, writer, 10, text)
-        bench.bench_func('writer-reader very long lines %s' % charset, writer_reader, 10, text)
+        bench.bench_func('writer very long lines %s' % charset, writer, 100, [text])
+        bench.bench_func('writer-reader very long lines %s' % charset, writer_reader, 100, [text])
 
     data = 'abc\n' * n
     bench.bench_func('reader ascii', reader, data)

vstinner · 2012-08-11T15:31:16Z

Victor, your benchmark is buggy (it writes one character at a time).

Oh, it's not what I wanted to test.

I attach a new benchmark. Here are the results. PyAccu looks much more appropriate than _PyUnicodeWriter, because it is always faster, except to write 100.000 very long lines.

Common platform:
CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
CFLAGS: -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
Bits: int=32, long=64, long long=64, pointer=64
Python unicode implementation: PEP-393
Platform: Linux-3.4.4-4.fc16.x86_64-x86_64-with-fedora-16-Verne

Platform of campaign pyaccu:
SCM: hg revision=9804aec74d4a tag=tip branch=default date="2012-08-10 18:55 -0400"
Date: 2012-08-11 16:53:46
Python version: 3.3.0b1 (default:9804aec74d4a, Aug 11 2012, 16:53:12) [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)]

Platform of campaign writer:
SCM: hg revision=9804aec74d4a+ tag=tip branch=default date="2012-08-10 18:55 -0400"
Date: 2012-08-11 16:50:40
Python version: 3.3.0b1 (default:9804aec74d4a+, Aug 11 2012, 16:33:18) [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)]

--------------------------------------+-------------+---------------
10 lines | pyaccu | writer
--------------------------------------+-------------+---------------
reader short line ascii | 1.53 us () | 1.46 us
writer short line ascii | 4.85 us () | 4.48 us (-8%)
writer-reader short line ascii | 6.45 us () | 5.71 us (-12%)
reader short line latin1 | 1.57 us () | 1.45 us (-8%)
writer short line latin1 | 4.92 us () | 4.56 us (-7%)
writer-reader short line latin1 | 6.6 us () | 5.78 us (-13%)
reader short line bmp | 1.64 us () | 1.54 us (-6%)
writer short line bmp | 5.01 us () | 4.43 us (-12%)
writer-reader short line bmp | 6.68 us () | 5.71 us (-14%)
reader short line non-bmp | 1.61 us () | 1.59 us
writer short line non-bmp | 5.1 us () | 4.55 us (-11%)
writer-reader short line non-bmp | 6.74 us () | 5.66 us (-16%)
reader long lines ascii | 103 us () | 33.4 us (-68%)
writer long lines ascii | 998 ns () | 836 ns (-16%)
writer-reader long lines ascii | 1.45 us () | 1.18 us (-19%)
reader long lines latin1 | 105 us () | 34.2 us (-67%)
writer long lines latin1 | 997 ns () | 831 ns (-17%)
writer-reader long lines latin1 | 1.47 us () | 1.2 us (-18%)
reader long lines bmp | 121 us () | 85.9 us (-29%)
writer long lines bmp | 995 ns () | 861 ns (-13%)
writer-reader long lines bmp | 1.43 us () | 1.13 us (-21%)
reader long lines non-bmp | 97.1 us () | 99.7 us
writer long lines non-bmp | 1 us () | 819 ns (-18%)
writer-reader long lines non-bmp | 1.4 us () | 1.18 us (-16%)
reader very long lines ascii | 1.42 us () | 1.45 us
writer very long lines ascii | 3.04 us () | 2.88 us (-5%)
writer-reader very long lines ascii | 4.59 us () | 4.12 us (-10%)
reader very long lines latin1 | 1.57 us () | 1.47 us (-7%)
writer very long lines latin1 | 3.04 us () | 2.73 us (-10%)
writer-reader very long lines latin1 | 4.66 us () | 4.04 us (-13%)
reader very long lines bmp | 1.55 us () | 1.55 us
writer very long lines bmp | 3.03 us () | 2.91 us
writer-reader very long lines bmp | 4.72 us () | 4.08 us (-14%)
reader very long lines non-bmp | 1.55 us () | 1.49 us
writer very long lines non-bmp | 3.09 us () | 2.93 us (-5%)
writer-reader very long lines non-bmp | 4.59 us () | 4.06 us (-12%)
--------------------------------------+-------------+---------------
Total | 525 us (*) | 342 us (-35%)
--------------------------------------+-------------+---------------

--------------------------------------+-------------+---------------
1000 lines | pyaccu | writer
--------------------------------------+-------------+---------------
reader short line ascii | 68.2 us () | 66.1 us
writer short line ascii | 308 us () | 307 us
writer-reader short line ascii | 378 us () | 374 us
reader short line latin1 | 72 us () | 68.5 us
writer short line latin1 | 324 us () | 313 us
writer-reader short line latin1 | 395 us () | 383 us
reader short line bmp | 74.8 us () | 71.9 us
writer short line bmp | 326 us () | 303 us (-7%)
writer-reader short line bmp | 397 us () | 378 us
reader short line non-bmp | 72.9 us () | 72.6 us
writer short line non-bmp | 329 us () | 304 us (-8%)
writer-reader short line non-bmp | 397 us () | 383 us
reader long lines ascii | 104 us () | 33.8 us (-67%)
writer long lines ascii | 1.99 us () | 2.52 us (+27%)
writer-reader long lines ascii | 4.37 us () | 3.45 us (-21%)
reader long lines latin1 | 104 us () | 33.3 us (-68%)
writer long lines latin1 | 2.07 us () | 2.55 us (+23%)
writer-reader long lines latin1 | 4.51 us () | 3.57 us (-21%)
reader long lines bmp | 120 us () | 80.5 us (-33%)
writer long lines bmp | 2.15 us () | 2.55 us (+18%)
writer-reader long lines bmp | 4.71 us () | 3.86 us (-18%)
reader long lines non-bmp | 90.6 us () | 97.6 us (+8%)
writer long lines non-bmp | 2.18 us () | 2.68 us (+23%)
writer-reader long lines non-bmp | 4.24 us () | 4.05 us
reader very long lines ascii | 2.53 us () | 1.66 us (-34%)
writer very long lines ascii | 3.07 us () | 3.46 us (+13%)
writer-reader very long lines ascii | 6.18 us () | 4.89 us (-21%)
reader very long lines latin1 | 2.57 us () | 1.75 us (-32%)
writer very long lines latin1 | 3.16 us () | 3.46 us (+10%)
writer-reader very long lines latin1 | 6.32 us () | 4.98 us (-21%)
reader very long lines bmp | 2.7 us () | 2.34 us (-14%)
writer very long lines bmp | 3.52 us () | 3.65 us
writer-reader very long lines bmp | 6.73 us () | 5.7 us (-15%)
reader very long lines non-bmp | 2.45 us () | 2.35 us
writer very long lines non-bmp | 3.47 us () | 3.87 us (+12%)
writer-reader very long lines non-bmp | 5.98 us () | 5.85 us
--------------------------------------+-------------+---------------
Total | 3.63 ms (*) | 3.34 ms (-8%)
--------------------------------------+-------------+---------------

--------------------------------------+-------------+---------------
100000 lines | pyaccu | writer
--------------------------------------+-------------+---------------
reader short line ascii | 6.74 ms () | 6.43 ms
writer short line ascii | 30.7 ms () | 29.8 ms
writer-reader short line ascii | 37.5 ms () | 36.6 ms
reader short line latin1 | 7.08 ms () | 6.64 ms (-6%)
writer short line latin1 | 31.3 ms () | 30.1 ms
writer-reader short line latin1 | 38.8 ms () | 37.5 ms
reader short line bmp | 7.46 ms () | 6.98 ms (-6%)
writer short line bmp | 32 ms () | 29 ms (-9%)
writer-reader short line bmp | 40.5 ms () | 35.9 ms (-11%)
reader short line non-bmp | 7.36 ms () | 7.23 ms
writer short line non-bmp | 33.3 ms () | 29.4 ms (-12%)
writer-reader short line non-bmp | 40.5 ms () | 36.5 ms (-10%)
reader long lines ascii | 103 us () | 32.6 us (-68%)
writer long lines ascii | 59.4 us () | 66.5 us (+12%)
writer-reader long lines ascii | 220 us () | 99.2 us (-55%)
reader long lines latin1 | 105 us () | 32.2 us (-69%)
writer long lines latin1 | 60.2 us () | 67.3 us (+12%)
writer-reader long lines latin1 | 240 us () | 97.6 us (-59%)
reader long lines bmp | 122 us () | 76.9 us (-37%)
writer long lines bmp | 62.1 us () | 73.8 us (+19%)
writer-reader long lines bmp | 242 us () | 151 us (-38%)
reader long lines non-bmp | 95.7 us () | 92.1 us
writer long lines non-bmp | 76.5 us () | 90.3 us (+18%)
writer-reader long lines non-bmp | 198 us () | 173 us (-12%)
reader very long lines ascii | 91.6 us () | 11.5 us (-87%)
writer very long lines ascii | 7.15 us () | 11.9 us (+67%)
writer-reader very long lines ascii | 145 us () | 20.1 us (-86%)
reader very long lines latin1 | 110 us () | 12 us (-89%)
writer very long lines latin1 | 7.52 us () | 12.1 us (+61%)
writer-reader very long lines latin1 | 165 us () | 20.7 us (-87%)
reader very long lines bmp | 91.1 us () | 46.7 us (-49%)
writer very long lines bmp | 12.3 us () | 22.5 us (+82%)
writer-reader very long lines bmp | 150 us () | 61.9 us (-59%)
reader very long lines non-bmp | 66.8 us () | 66.6 us
writer very long lines non-bmp | 22.4 us () | 38.4 us (+72%)
writer-reader very long lines non-bmp | 108 us () | 87.7 us (-19%)
--------------------------------------+-------------+---------------
Total | 316 ms (*) | 294 ms (-7%)
--------------------------------------+-------------+---------------

-------------+-------------+--------------
Summary | pyaccu | writer
-------------+-------------+--------------
10 lines | 525 us () | 342 us (-35%)
1000 lines | 3.63 ms () | 3.34 ms (-8%)
100000 lines | 316 ms () | 294 ms (-7%)
-------------+-------------+--------------
Total | 320 ms () | 297 ms (-7%)
-------------+-------------+--------------

vstinner · 2012-08-11T15:35:26Z

"PyAccu looks much more appropriate than _PyUnicodeWriter, because it is always faster, except to write 100.000 very long lines."

Oh... I added colors to my tool, but there was a bug: I used the wrong colors... It's just the opposite.

_PyUnicodeWriter is almost always faster, except to write more than 100.000 very long lines.

pitrou · 2012-08-11T16:19:36Z

_PyUnicodeWriter is almost always faster

Actually, PyAccu is consistently faster for the "writer" case, while _PyUnicodeWriter is faster for the "writer-reader" case.
This is not because of PyAccu, but because of the way StringIO uses it: when e.g. readline() is called, the PyAccu result is converted into a PyUCS4* buffer, then each readline() result is converted again by finding the max char in the sub-buffer.

So I would suggest using PyAccu, but converting its result to a _PyUnicodeWriter rather than a PyUCS4* buffer.

serhiy-storchaka · 2012-08-11T16:45:33Z

See benchmark results in bpo-15381 (the patch is not applicable to StringIO). These numbers show that resize strategy can be much slower append/join strategy on Windows.

vstinner · 2015-03-18T11:04:56Z

I'm no more interested to work on this issue, and it's not clear that _PyUnicodeWriter is always faster. Switch from a list to _PyUnicodeWriter on a specific event would make the code much more complex. I prefer to just close the issue.

vstinner added topic-unicode topic-IO labels Aug 10, 2012

vstinner added the performance Performance or resource usage label Aug 10, 2012

vstinner changed the title ~~Rewriter StringIO to use the _PyUnicodeWriter API~~ Rewrite StringIO to use the _PyUnicodeWriter API Aug 10, 2012

vstinner closed this as completed Mar 18, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite StringIO to use the _PyUnicodeWriter API #59817

Rewrite StringIO to use the _PyUnicodeWriter API #59817

vstinner commented Aug 10, 2012

vstinner commented Aug 10, 2012

vstinner commented Aug 10, 2012

pitrou commented Aug 10, 2012

pitrou commented Aug 10, 2012

vstinner commented Aug 10, 2012

pitrou commented Aug 10, 2012

pitrou commented Aug 11, 2012

vstinner commented Aug 11, 2012

vstinner commented Aug 11, 2012

pitrou commented Aug 11, 2012

serhiy-storchaka commented Aug 11, 2012

vstinner commented Mar 18, 2015

Rewrite StringIO to use the _PyUnicodeWriter API #59817

Rewrite StringIO to use the _PyUnicodeWriter API #59817

Comments

vstinner commented Aug 10, 2012

vstinner commented Aug 10, 2012

vstinner commented Aug 10, 2012

pitrou commented Aug 10, 2012

pitrou commented Aug 10, 2012

vstinner commented Aug 10, 2012

pitrou commented Aug 10, 2012

pitrou commented Aug 11, 2012

vstinner commented Aug 11, 2012

vstinner commented Aug 11, 2012

pitrou commented Aug 11, 2012

serhiy-storchaka commented Aug 11, 2012

vstinner commented Mar 18, 2015