classification
Title: Faster utf-16 decoder
Type: performance Stage: resolved
Components: Interpreter Core, Unicode Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, asvetlov, ezio.melotti, haypo, loewis, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2012-04-19 20:59 by serhiy.storchaka, last changed 2012-05-19 09:02 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
decode_utf16.patch serhiy.storchaka, 2012-04-19 20:58 review
decodebench.py serhiy.storchaka, 2012-04-23 21:01
bench-diff.py serhiy.storchaka, 2012-04-23 21:01
decode_utf16_2.patch serhiy.storchaka, 2012-05-03 13:21 review
decode_utf16_3.patch serhiy.storchaka, 2012-05-11 19:24 review
decode_utf16_4.patch serhiy.storchaka, 2012-05-14 22:14 review
decode_utf16_5.patch serhiy.storchaka, 2012-05-15 21:29 review
decode_utf16_6.patch serhiy.storchaka, 2012-05-15 21:29 review
Messages (15)
msg158748 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-04-19 20:58
I propose a patch, which accelerates the utf-16 decoder. With PEP 393 utf-16 decoder slowed down a few times (3-4x), this patch returns the performance at the level of Python 3.2 and even higher (+10-30% over 3.2).

In addition, it fixes a few bugs in the utf-16 decoder. Also as a side effect is possible acceleration of other decoders.
msg158751 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-04-19 21:03
See also #14625 for UTF-32 decoder.
msg158753 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-04-19 21:09
See also issue #14579 for utf-16 decoder bugs.
msg158772 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-04-19 23:08
Serhiy: can you please submit a contributor form?
msg159077 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-04-23 21:01
Here are the results of benchmarking (numbers in MB/s).

On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz:

                                          Py2.7        Py3.2        Py3.3       patch

utf-16le  'A'*10000                       504 (+282%)  1905 (+1%)   565 (+241%)  1927
utf-16le  '\x80'*10000                    503 (+264%)  1894 (-3%)   417 (+340%)  1833
utf-16le    '\x80'+'A'*9999               504 (+264%)  1890 (-3%)   422 (+335%)  1834
utf-16le  '\u0100'*10000                  503 (+249%)  1896 (-7%)   357 (+391%)  1754
utf-16le    '\u0100'+'A'*9999             504 (+252%)  1896 (-6%)   360 (+393%)  1776
utf-16le    '\u0100'+'\x80'*9999          503 (+249%)  1890 (-7%)   357 (+392%)  1756
utf-16le  '\u8000'*10000                  503 (-18%)   355 (+16%)   75 (+449%)   412
utf-16le    '\u8000'+'A'*9999             504 (+254%)  1892 (-6%)   359 (+397%)  1783
utf-16le    '\u8000'+'\x80'*9999          503 (+249%)  1896 (-7%)   357 (+392%)  1755
utf-16le    '\u8000'+'\u0100'*9999        503 (+258%)  1901 (-5%)   359 (+402%)  1802
utf-16le  '\U00010000'*10000              484 (-14%)   379 (+9%)    103 (+303%)  415
utf-16le    '\U00010000'+'A'*9999         504 (+244%)  1905 (-9%)   353 (+392%)  1735
utf-16le    '\U00010000'+'\x80'*9999      503 (+245%)  1899 (-9%)   348 (+398%)  1733
utf-16le    '\U00010000'+'\u0100'*9999    503 (+244%)  1882 (-8%)   348 (+397%)  1729
utf-16le    '\U00010000'+'\u8000'*9999    503 (-18%)   355 (+16%)   71 (+482%)   413

utf-16be  'A'*10000                       504 (+284%)  1553 (+24%)  469 (+312%)  1933
utf-16be  '\x80'*10000                    504 (+251%)  1551 (+14%)  387 (+357%)  1770
utf-16be    '\x80'+'A'*9999               504 (+261%)  1549 (+17%)  386 (+371%)  1819
utf-16be  '\u0100'*10000                  503 (+175%)  1544 (-10%)  333 (+316%)  1384
utf-16be    '\u0100'+'A'*9999             505 (+178%)  1548 (-9%)   335 (+319%)  1403
utf-16be    '\u0100'+'\x80'*9999          503 (+179%)  1552 (-9%)   336 (+318%)  1405
utf-16be  '\u8000'*10000                  503 (-2%)    415 (+19%)   75 (+559%)   494
utf-16be    '\u8000'+'A'*9999             504 (+179%)  1551 (-9%)   335 (+320%)  1408
utf-16be    '\u8000'+'\x80'*9999          504 (+178%)  1551 (-10%)  336 (+317%)  1402
utf-16be    '\u8000'+'\u0100'*9999        504 (+179%)  1549 (-9%)   336 (+318%)  1404
utf-16be  '\U00010000'*10000              483 (-7%)    407 (+10%)   105 (+326%)  447
utf-16be    '\U00010000'+'A'*9999         504 (+149%)  1554 (-19%)  317 (+295%)  1253
utf-16be    '\U00010000'+'\x80'*9999      503 (+153%)  1543 (-17%)  317 (+302%)  1275
utf-16be    '\U00010000'+'\u0100'*9999    503 (+153%)  1537 (-17%)  317 (+302%)  1274
utf-16be    '\U00010000'+'\u8000'*9999    503 (-2%)    415 (+19%)   71 (+597%)   495

On 32-bit Linux, Intel Atom N570 @ 1.66GHz:

                                          Py2.7        Py3.2        Py3.3       patch

utf-16le  'A'*10000                       136 (+417%)  584 (+20%)   184 (+282%)  703
utf-16le  '\x80'*10000                    136 (+392%)  580 (+15%)   160 (+318%)  669
utf-16le    '\x80'+'A'*9999               136 (+398%)  582 (+16%)   159 (+326%)  677
utf-16le  '\u0100'*10000                  137 (+346%)  583 (+5%)    129 (+374%)  611
utf-16le    '\u0100'+'A'*9999             136 (+358%)  582 (+7%)    129 (+383%)  623
utf-16le    '\u0100'+'\x80'*9999          136 (+348%)  580 (+5%)    129 (+372%)  609
utf-16le  '\u8000'*10000                  136 (+18%)   127 (+27%)   38 (+324%)   161
utf-16le    '\u8000'+'A'*9999             136 (+357%)  582 (+7%)    129 (+382%)  622
utf-16le    '\u8000'+'\x80'*9999          136 (+351%)  581 (+6%)    128 (+380%)  614
utf-16le    '\u8000'+'\u0100'*9999        136 (+349%)  581 (+5%)    129 (+374%)  611
utf-16le  '\U00010000'*10000              153 (-3%)    140 (+6%)    53 (+181%)   149
utf-16le    '\U00010000'+'A'*9999         136 (+296%)  581 (-7%)    131 (+311%)  538
utf-16le    '\U00010000'+'\x80'*9999      136 (+289%)  584 (-9%)    131 (+304%)  529
utf-16le    '\U00010000'+'\u0100'*9999    136 (+290%)  579 (-8%)    130 (+308%)  530
utf-16le    '\U00010000'+'\u8000'*9999    136 (+25%)   128 (+33%)   38 (+347%)   170

utf-16be  'A'*10000                       136 (+331%)  441 (+33%)   166 (+253%)  586
utf-16be  '\x80'*10000                    136 (+309%)  440 (+26%)   145 (+283%)  556
utf-16be    '\x80'+'A'*9999               136 (+312%)  442 (+27%)   145 (+286%)  560
utf-16be  '\u0100'*10000                  136 (+231%)  441 (+2%)    120 (+275%)  450
utf-16be    '\u0100'+'A'*9999             136 (+232%)  442 (+2%)    120 (+276%)  451
utf-16be    '\u0100'+'\x80'*9999          136 (+231%)  438 (+3%)    119 (+278%)  450
utf-16be  '\u8000'*10000                  136 (+22%)   127 (+31%)   38 (+337%)   166
utf-16be    '\u8000'+'A'*9999             136 (+232%)  439 (+3%)    120 (+276%)  451
utf-16be    '\u8000'+'\x80'*9999          136 (+230%)  439 (+2%)    120 (+274%)  449
utf-16be    '\u8000'+'\u0100'*9999        136 (+232%)  439 (+3%)    120 (+276%)  451
utf-16be  '\U00010000'*10000              153 (-1%)    139 (+9%)    52 (+192%)   152
utf-16be    '\U00010000'+'A'*9999         136 (+211%)  440 (-4%)    121 (+250%)  423
utf-16be    '\U00010000'+'\x80'*9999      136 (+210%)  440 (-4%)    122 (+246%)  422
utf-16be    '\U00010000'+'\u0100'*9999    136 (+210%)  441 (-5%)    121 (+248%)  421
utf-16be    '\U00010000'+'\u8000'*9999    136 (+27%)   128 (+35%)   38 (+355%)   173
msg159090 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-23 21:57
64 bit Linux, Intel Core i5-2500K @ 3.30GHz:

                                          vanilla 3.3   patched

utf-16le  'A'*10000                       1384 (+278%)	5233
utf-16le      'A'*9999+'\x80'             1303 (+259%)	4684
utf-16le      'A'*9999+'\u0100'           953 (+195%)	2813
utf-16le      'A'*9999+'\u8000'           953 (+195%)	2814
utf-16le      'A'*9999+'\U00010000'       979 (+197%)	2903
utf-16le  '\x80'*10000                    1243 (+321%)	5230
utf-16le    '\x80'+'A'*9999               1256 (+313%)	5188
utf-16le      '\x80'*9999+'\u0100'        880 (+214%)	2765
utf-16le      '\x80'*9999+'\u8000'        880 (+214%)	2763
utf-16le      '\x80'*9999+'\U00010000'    899 (+218%)	2860
utf-16le  '\u0100'*10000                  1047 (+370%)	4917
utf-16le    '\u0100'+'A'*9999             1046 (+369%)	4906
utf-16le    '\u0100'+'\x80'*9999          1047 (+370%)	4920
utf-16le      '\u0100'*9999+'\u8000'      1047 (+369%)	4906
utf-16le      '\u0100'*9999+'\U00010000'  791 (+253%)	2793
utf-16le  '\u8000'*10000                  230 (+410%)	1173
utf-16le    '\u8000'+'A'*9999             1043 (+371%)	4911
utf-16le    '\u8000'+'\x80'*9999          1044 (+345%)	4645
utf-16le    '\u8000'+'\u0100'*9999        1041 (+350%)	4681
utf-16le      '\u8000'*9999+'\U00010000'  215 (+357%)	983
utf-16le  '\U00010000'*10000              362 (+170%)	976
utf-16le    '\U00010000'+'A'*9999         985 (+210%)	3052
utf-16le    '\U00010000'+'\x80'*9999      985 (+211%)	3066
utf-16le    '\U00010000'+'\u0100'*9999    983 (+209%)	3042
utf-16le    '\U00010000'+'\u8000'*9999    245 (+329%)	1052

utf-16be  'A'*10000                       1268 (+313%)	5240
utf-16be      'A'*9999+'\x80'             1199 (+297%)	4758
utf-16be      'A'*9999+'\u0100'           896 (+211%)	2786
utf-16be      'A'*9999+'\u8000'           897 (+211%)	2788
utf-16be      'A'*9999+'\U00010000'       919 (+214%)	2885
utf-16be  '\x80'*10000                    1154 (+341%)	5087
utf-16be    '\x80'+'A'*9999               1155 (+343%)	5112
utf-16be      '\x80'*9999+'\u0100'        829 (+229%)	2728
utf-16be      '\x80'*9999+'\u8000'        828 (+229%)	2726
utf-16be      '\x80'*9999+'\U00010000'    852 (+232%)	2832
utf-16be  '\u0100'*10000                  981 (+332%)	4241
utf-16be    '\u0100'+'A'*9999             981 (+330%)	4220
utf-16be    '\u0100'+'\x80'*9999          977 (+331%)	4213
utf-16be      '\u0100'*9999+'\u8000'      982 (+331%)	4237
utf-16be      '\u0100'*9999+'\U00010000'  748 (+237%)	2520
utf-16be  '\u8000'*10000                  230 (+413%)	1180
utf-16be    '\u8000'+'A'*9999             979 (+331%)	4218
utf-16be    '\u8000'+'\x80'*9999          974 (+333%)	4215
utf-16be    '\u8000'+'\u0100'*9999        972 (+335%)	4226
utf-16be      '\u8000'*9999+'\U00010000'  215 (+361%)	992
utf-16be  '\U00010000'*10000              362 (+170%)	978
utf-16be    '\U00010000'+'A'*9999         924 (+232%)	3064
utf-16be    '\U00010000'+'\x80'*9999      921 (+223%)	2979
utf-16be    '\U00010000'+'\u0100'*9999    921 (+233%)	3064
utf-16be    '\U00010000'+'\u8000'*9999    245 (+329%)	1052
msg159847 - (view) Author: Roundup Robot (python-dev) Date: 2012-05-03 10:34
New changeset 830eeff4fe8f by Victor Stinner in branch 'default':
Issue #14624, #14687: Optimize unicode_widen()
http://hg.python.org/cpython/rev/830eeff4fe8f
msg159858 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-03 13:21
Here is updated patch, taking into account that unicode_widen is already
optimized.
msg160442 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-11 19:24
The patch updated to stylistic conformity of the UTF-8 decoder. The decoding of the UCS2 non-surrogate characters a little speed up (+15%).
msg160572 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-05-13 20:38
New performance figures under 64 bit Linux, Intel Core i5-2500K @ 3.30GHz:

                                          vanilla 3.3   patched

utf-16le  'A'*10000                       1411 (+290%)	5504
utf-16le      'A'*9999+'\x80'             1368 (+263%)	4970
utf-16le      'A'*9999+'\u0100'           1145 (+151%)	2871
utf-16le      'A'*9999+'\u8000'           1144 (+151%)	2870
utf-16le      'A'*9999+'\U00010000'       1164 (+154%)	2957
utf-16le  '\x80'*10000                    1403 (+271%)	5209
utf-16le    '\x80'+'A'*9999               1406 (+272%)	5235
utf-16le      '\x80'*9999+'\u0100'        1138 (+138%)	2713
utf-16le      '\x80'*9999+'\u8000'        1138 (+139%)	2716
utf-16le      '\x80'*9999+'\U00010000'    1155 (+151%)	2897
utf-16le  '\u0100'*10000                  1477 (+243%)	5062
utf-16le    '\u0100'+'A'*9999             1478 (+243%)	5072
utf-16le    '\u0100'+'\x80'*9999          1477 (+243%)	5062
utf-16le      '\u0100'*9999+'\u8000'      1478 (+242%)	5055
utf-16le      '\u0100'*9999+'\U00010000'  1201 (+131%)	2776
utf-16le  '\u8000'*10000                  246 (+347%)	1100
utf-16le    '\u8000'+'A'*9999             1475 (+244%)	5069
utf-16le    '\u8000'+'\x80'*9999          1474 (+243%)	5062
utf-16le    '\u8000'+'\u0100'*9999        1473 (+243%)	5057
utf-16le      '\u8000'*9999+'\U00010000'  236 (+295%)	932
utf-16le  '\U00010000'*10000              393 (+164%)	1039
utf-16le    '\U00010000'+'A'*9999         1325 (+134%)	3106
utf-16le    '\U00010000'+'\x80'*9999      1326 (+134%)	3103
utf-16le    '\U00010000'+'\u0100'*9999    1326 (+134%)	3104
utf-16le    '\U00010000'+'\u8000'*9999    253 (+331%)	1091

utf-16be  'A'*10000                       1341 (+298%)	5342
utf-16be      'A'*9999+'\x80'             1305 (+275%)	4888
utf-16be      'A'*9999+'\u0100'           1101 (+157%)	2834
utf-16be      'A'*9999+'\u8000'           1102 (+157%)	2831
utf-16be      'A'*9999+'\U00010000'       1115 (+162%)	2917
utf-16be  '\x80'*10000                    1326 (+296%)	5253
utf-16be    '\x80'+'A'*9999               1322 (+298%)	5258
utf-16be      '\x80'*9999+'\u0100'        1088 (+156%)	2781
utf-16be      '\x80'*9999+'\u8000'        1088 (+155%)	2770
utf-16be      '\x80'*9999+'\U00010000'    1103 (+159%)	2854
utf-16be  '\u0100'*10000                  1344 (+221%)	4308
utf-16be    '\u0100'+'A'*9999             1342 (+223%)	4330
utf-16be    '\u0100'+'\x80'*9999          1343 (+221%)	4307
utf-16be      '\u0100'*9999+'\u8000'      1343 (+221%)	4306
utf-16be      '\u0100'*9999+'\U00010000'  1109 (+128%)	2529
utf-16be  '\u8000'*10000                  248 (+341%)	1094
utf-16be    '\u8000'+'A'*9999             1340 (+223%)	4331
utf-16be    '\u8000'+'\x80'*9999          1341 (+221%)	4307
utf-16be    '\u8000'+'\u0100'*9999        1341 (+221%)	4309
utf-16be      '\u8000'*9999+'\U00010000'  239 (+290%)	931
utf-16be  '\U00010000'*10000              399 (+160%)	1037
utf-16be    '\U00010000'+'A'*9999         1230 (+152%)	3101
utf-16be    '\U00010000'+'\x80'*9999      1218 (+154%)	3095
utf-16be    '\U00010000'+'\u0100'*9999    1220 (+154%)	3095
utf-16be    '\U00010000'+'\u8000'*9999    257 (+318%)	1074
msg160672 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-14 22:14
The patch updated with a little clarified code and added comments.
msg160766 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-15 21:29
Here are two new patch. Checking for characters out-of-range moved,
making the code simpler. Theoretically it is a bit slow down decoding of
short UCS1 strings (up to 1 and 3 chars on 32- and 64-bit), but
practically there is no difference. The second patch is different from
the first patch that masks are not calculated and specified explicitly.
I am not sure that it improves readability. The commiter has the choice.
msg160768 - (view) Author: Roundup Robot (python-dev) Date: 2012-05-15 21:50
New changeset cdcc816dea85 by Antoine Pitrou in branch 'default':
Issue #14624: UTF-16 decoding is now 3x to 4x faster on various inputs.
http://hg.python.org/cpython/rev/cdcc816dea85
msg160769 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-05-15 21:52
Thank you Serhiy! Now committed.
msg161100 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-19 09:02
Thank you, Antoine. Now only issue14625 waits for review.

> changeset:   77012:3430d7329a3b
> +* UTF-8 and UTF-16 decoding is now 2x to 4x faster.

In fact now UTF-16 decoding faster for a maximum of +25% compared to Python 3.2 on my computers (and sometimes a little slower yet). 2x to 4x it is faster compared to former slow-downed Python 3.3 (thanks to PEP 393).
History
Date User Action Args
2012-05-19 09:02:56serhiy.storchakasetmessages: + msg161100
2012-05-15 21:52:07pitrousetstatus: open -> closed
resolution: fixed
messages: + msg160769

stage: resolved
2012-05-15 21:50:53python-devsetmessages: + msg160768
2012-05-15 21:29:28serhiy.storchakasetfiles: + decode_utf16_5.patch, decode_utf16_6.patch

messages: + msg160766
2012-05-14 22:14:49serhiy.storchakasetfiles: + decode_utf16_4.patch

messages: + msg160672
2012-05-13 20:38:32pitrousetmessages: + msg160572
2012-05-11 19:46:24serhiy.storchakasetnosy: + ezio.melotti
components: + Unicode
2012-05-11 19:24:30serhiy.storchakasetfiles: + decode_utf16_3.patch

messages: + msg160442
2012-05-03 13:21:47serhiy.storchakasetfiles: + decode_utf16_2.patch

messages: + msg159858
2012-05-03 10:34:06python-devsetnosy: + python-dev
messages: + msg159847
2012-04-23 21:57:20pitrousetmessages: + msg159090
2012-04-23 21:01:19serhiy.storchakasetfiles: + decodebench.py, bench-diff.py

messages: + msg159077
2012-04-20 21:38:25asvetlovsetnosy: + asvetlov
2012-04-20 06:35:36Arfreversetnosy: + Arfrever
2012-04-19 23:08:21loewissetnosy: + loewis
messages: + msg158772
2012-04-19 21:09:26serhiy.storchakasetmessages: + msg158753
2012-04-19 21:03:29hayposetmessages: + msg158751
2012-04-19 21:02:57hayposetnosy: + haypo
2012-04-19 21:01:54hayposetnosy: + pitrou
2012-04-19 20:59:00serhiy.storchakacreate