classification
Title: Use _PyUnicodeWriter API in text decoders
Type: performance Stage: resolved
Components: Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: loewis, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2012-10-24 18:38 by vstinner, last changed 2012-11-07 22:53 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
codecs_writer.patch vstinner, 2012-10-24 18:38 review
codecs_writer_2.patch serhiy.storchaka, 2012-10-31 13:10 review
decodebench.res serhiy.storchaka, 2012-10-31 13:14 Benchmark results
Messages (9)
msg173695 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-10-24 18:38
Attached patch modifies text decoders to use the _PyUnicodeWriter API to factorize the code. It removes unicode_widen() and unicode_putchar() functions.

 * Don't overallocate by default  (except for "raw-unicode-escape" codec), enable overallocation on the first decode error (as done currently)
 * _PyUnicodeWriter_Prepare() only overallocates 25%, instead of 100%
for unicode_decode_call_errorhandler()
 * Use _PyUnicodeWriter_Prepare() + PyUnicode_WRITE() (two macros)
instead of unicode_putchar() (function)
 * _PyUnicodeWriter structures stores many useful fields, so we don't
have to pass multiple parameters to functions, only the writer

I wrote the patch to factorize the code, but it might be faster.
msg173697 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-24 19:44
Soon I'll post a patch, which speeds up unicode-escape and raw-unicode-escape decoders to 1.5-3x. Also there are not yet reviewed patches for UTF-32 (issue14625) and charmap (issue14850) decoders. Will be merge conflicts.

But I will review the patch.
msg174171 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-10-30 01:02
"Soon I'll post a patch, which speeds up unicode-escape and raw-unicode-escape decoders to 1.5-3x. Also there are not yet reviewed patches for UTF-32 (issue14625) and charmap (issue14850) decoders. Will be merge conflicts."

codecs_writer.patch doesn't change too much the core of decoders, but mostly the code before and after the loop, and error handling. You can still use PyUnicode_WRITE, PyUnicode_READ, memcpy(), etc.

"But I will review the patch."

If you review the patch, please check that how the buffer is allocated. It should not be overallocated by default, only on the first error. Overallocation can kill performances when it is not necessary (especially on Windows).
msg174238 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-30 23:17
I will do some experiments and review tomorrow.
msg174273 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-31 12:50
I updated the patch to resolve the conflict with issue14625.
msg174275 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-31 13:58
With the patch UTF-8 decoder 20% slower for some data. UTF-16 decoder 20% faster for some data and 20% slower for other data. UTF-32 decoder slower for many data (even after some optimization, naive code was up to 50% slower). Standard charmap decoder 10% slower. Only UTF-7, unicode-escape and raw-unicode-escape have become much faster (unicode-escape and raw-unicode-escape as with issue16334 patch).

A well optimized decoders do not benefit from the _PyUnicodeWriter, only a slight slowdown. The patch requires some optimization (as for UTF-32 decoder) to reduce the negative effect. Non-optimized decoders will receive the great benefit.
msg174293 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-10-31 15:30
I ran decodebench.py and bench-diff.py scripts from #14624, I just
replaced repeat=10 with repeat=100 to get more reliable numbers. I
only see some performance regressions between -5% and -1%, but there
are some speedup on UTF-8 and UTF-32 (between +11% and +14%). On a
microbenchmark, numbers in the -10..10% range just means "no change".

Using _PyUnicodeWriter should not change anything to performances on
valid data, only performances of handling decoding errors between the
overallocation factor is different, the code to widen the buffer and
the code to write replacement characters.
msg175034 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-11-06 23:41
New changeset 7ed9993d53b4 by Victor Stinner in branch 'default':
Close #16311: Use the _PyUnicodeWriter API in text decoders
http://hg.python.org/cpython/rev/7ed9993d53b4
msg175129 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-11-07 22:53
Oh, I forgot my benchmark results.

decodebench.py result results on Linux 32 bits:
(Linux-3.2.0-32-generic-pae-i686-with-debian-wheezy-sid)

$ ./python bench-diff.py original writer
ascii     'A'*10000                       4109 (-3%)    3974

latin1    'A'*10000                       3851 (-5%)    3644
latin1    '\x80'*10000                    14832 (-3%)   14430

utf-8     'A'*10000                       3747 (-4%)    3608
utf-8     '\x80'*10000                    976 (-2%)     961
utf-8     '\u0100'*10000                  974 (-2%)     959
utf-8     '\u8000'*10000                  804 (-14%)    694
utf-8     '\U00010000'*10000              666 (-5%)     635

utf-16le  'A'*10000                       4154 (-1%)    4117
utf-16le  '\x80'*10000                    4055 (-2%)    3988
utf-16le  '\u0100'*10000                  4047 (-2%)    3974
utf-16le  '\u8000'*10000                  917 (-1%)     912
utf-16le  '\U00010000'*10000              872 (-0%)     870

utf-16be  'A'*10000                       3218 (-1%)    3185
utf-16be  '\x80'*10000                    3163 (-2%)    3114
utf-16be  '\u0100'*10000                  2591 (-1%)    2556
utf-16be  '\u8000'*10000                  979 (-1%)     974
utf-16be  '\U00010000'*10000              928 (-0%)     925

utf-32le  'A'*10000                       1681 (+12%)   1885
utf-32le  '\x80'*10000                    1697 (+10%)   1865
utf-32le  '\u0100'*10000                  2224 (+1%)    2254
utf-32le  '\u8000'*10000                  2224 (+2%)    2269
utf-32le  '\U00010000'*10000              2234 (+1%)    2260

utf-32be  'A'*10000                       1685 (+11%)   1868
utf-32be  '\x80'*10000                    1684 (+10%)   1860
utf-32be  '\u0100'*10000                  2223 (+1%)    2253
utf-32be  '\u8000'*10000                  2222 (+1%)    2255
utf-32be  '\U00010000'*10000              2243 (+1%)    2257

decodebench.py result results on Linux 64 bits:
(Linux-3.4.9-2.fc16.x86_64-x86_64-with-fedora-16-Verne)

ascii     'A'*10000                       10043 (+1%)   10144

latin1    'A'*10000                       8351 (-1%)    8258
latin1    '\x80'*10000                    19184 (+2%)   19560

utf-8     'A'*10000                       8083 (+5%)    8461
utf-8     '\x80'*10000                    982 (+1%)     993
utf-8     '\u0100'*10000                  984 (+1%)     992
utf-8     '\u8000'*10000                  806 (+31%)    1053
utf-8     '\U00010000'*10000              639 (+12%)    718

utf-16le  'A'*10000                       5547 (-2%)    5422
utf-16le  '\x80'*10000                    5205 (+1%)    5271
utf-16le  '\u0100'*10000                  4900 (-4%)    4695
utf-16le  '\u8000'*10000                  1062 (+9%)    1154
utf-16le  '\U00010000'*10000              1040 (+4%)    1078

utf-16be  'A'*10000                       5416 (-5%)    5157
utf-16be  '\x80'*10000                    5077 (-1%)    5011
utf-16be  '\u0100'*10000                  4261 (-1%)    4218
utf-16be  '\u8000'*10000                  1146 (+0%)    1147
utf-16be  '\U00010000'*10000              1125 (-1%)    1119

utf-32le  'A'*10000                       1743 (+8%)    1880
utf-32le  '\x80'*10000                    1751 (+5%)    1842
utf-32le  '\u0100'*10000                  2114 (+29%)   2721
utf-32le  '\u8000'*10000                  2120 (+28%)   2718
utf-32le  '\U00010000'*10000              2065 (+30%)   2690

utf-32be  'A'*10000                       1761 (+6%)    1860
utf-32be  '\x80'*10000                    1749 (+6%)    1856
utf-32be  '\u0100'*10000                  2101 (+29%)   2715
utf-32be  '\u8000'*10000                  2083 (+30%)   2715
utf-32be  '\U00010000'*10000              2058 (+31%)   2689

Most significant changes:
 * -14% to decode '\u8000'*10000 from UTF-8 on Linux 32 bits
 * +31% to decode '\u8000'*10000 from UTF-8 on Linux 32 bits
 * +28% to +31% to decode UCS-2 and UCS-4 characters from UTF-8 on Linux 32 bits

@Serhiy Storchaka: If you feel able to tune _PyUnicodeWriter to
improve its performance, please open a new issue.

I consider the performance changes acceptable and I don't plan to work
on this topic.
History
Date User Action Args
2012-11-07 22:53:54vstinnersetmessages: + msg175129
2012-11-06 23:41:02python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg175034

resolution: fixed
stage: resolved
2012-10-31 15:30:41vstinnersetmessages: + msg174293
2012-10-31 13:58:45serhiy.storchakasetmessages: + msg174275
2012-10-31 13:14:31serhiy.storchakasetfiles: + decodebench.res
2012-10-31 13:10:30serhiy.storchakasetfiles: - codecs_writer_2.patch
2012-10-31 13:10:19serhiy.storchakasetfiles: + codecs_writer_2.patch
2012-10-31 12:50:05serhiy.storchakasetfiles: + codecs_writer_2.patch

messages: + msg174273
2012-10-30 23:17:20serhiy.storchakasetmessages: + msg174238
2012-10-30 01:02:32vstinnersetmessages: + msg174171
2012-10-24 19:44:32serhiy.storchakasetmessages: + msg173697
2012-10-24 18:38:53vstinnersetnosy: + loewis, serhiy.storchaka
2012-10-24 18:38:21vstinnercreate