classification
Title: Faster utf-32 decoder
Type: performance Stage: resolved
Components: Interpreter Core, Unicode Versions: Python 3.4, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, asvetlov, ezio.melotti, georg.brandl, haypo, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: 3.3regression, needs review, patch

Created on 2012-04-19 20:59 by serhiy.storchaka, last changed 2012-10-30 23:27 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
decode_utf32_a_3.patch serhiy.storchaka, 2012-10-20 19:10 review
decode_utf32_b_3.patch serhiy.storchaka, 2012-10-20 19:11 review
Messages (14)
msg158749 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-04-19 20:59
I suggest two variants of patch, accelerating the utf-32 decoder. With PEP 393 utf-32 decoder slowed down up to 2x, these patches returns a performance at the level of Python 3.2 and even much higher (2-3x over 3.2). The variant A is simpler, but the variant B is a little faster (+8-15%).
msg158750 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-04-19 21:03
See also #14624 for UTF-16 decoder.
msg159078 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-04-23 21:01
Here are the results of benchmarking (numbers in MB/s).

On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz:

                                          Py2.7         Py3.2         Py3.3         patchA       patchB

utf-32le  'A'*10000                       461 (+215%)   454 (+220%)   292 (+398%)   1213 (+20%)   1454
utf-32le  '\x80'*10000                    458 (+177%)   454 (+180%)   271 (+369%)   1124 (+13%)   1270
utf-32le    '\x80'+'A'*9999               462 (+171%)   454 (+175%)   271 (+361%)   1111 (+13%)   1250
utf-32le  '\u0100'*10000                  457 (+133%)   454 (+135%)   220 (+385%)   1002 (+6%)    1067
utf-32le    '\u0100'+'A'*9999             461 (+129%)   454 (+132%)   220 (+379%)   993 (+6%)     1054
utf-32le    '\u0100'+'\x80'*9999          458 (+131%)   454 (+133%)   220 (+381%)   1002 (+6%)    1059
utf-32le  '\u8000'*10000                  457 (+133%)   454 (+135%)   220 (+385%)   1002 (+6%)    1066
utf-32le    '\u8000'+'A'*9999             462 (+128%)   454 (+132%)   220 (+379%)   994 (+6%)     1053
utf-32le    '\u8000'+'\x80'*9999          457 (+132%)   454 (+134%)   220 (+382%)   1000 (+6%)    1061
utf-32le    '\u8000'+'\u0100'*9999        457 (+132%)   454 (+134%)   220 (+382%)   1002 (+6%)    1061
utf-32le  '\U00010000'*10000              386 (+167%)   416 (+148%)   212 (+386%)   930 (+11%)    1031
utf-32le    '\U00010000'+'A'*9999         461 (+126%)   415 (+152%)   222 (+370%)   940 (+11%)    1044
utf-32le    '\U00010000'+'\x80'*9999      458 (+125%)   415 (+148%)   222 (+364%)   930 (+11%)    1031
utf-32le    '\U00010000'+'\u0100'*9999    458 (+125%)   415 (+148%)   212 (+386%)   930 (+11%)    1031
utf-32le    '\U00010000'+'\u8000'*9999    458 (+125%)   415 (+149%)   222 (+365%)   930 (+11%)    1032

utf-32be  'A'*10000                       461 (+216%)   454 (+221%)   292 (+399%)   1209 (+20%)   1456
utf-32be  '\x80'*10000                    457 (+177%)   454 (+179%)   271 (+368%)   1125 (+13%)   1268
utf-32be    '\x80'+'A'*9999               462 (+171%)   453 (+176%)   271 (+362%)   1112 (+12%)   1251
utf-32be  '\u0100'*10000                  457 (+144%)   453 (+146%)   220 (+407%)   1048 (+6%)    1116
utf-32be    '\u0100'+'A'*9999             462 (+139%)   454 (+143%)   220 (+402%)   1034 (+7%)    1104
utf-32be    '\u0100'+'\x80'*9999          459 (+142%)   453 (+145%)   220 (+405%)   1047 (+6%)    1112
utf-32be  '\u8000'*10000                  457 (+144%)   453 (+147%)   220 (+408%)   1046 (+7%)    1117
utf-32be    '\u8000'+'A'*9999             462 (+139%)   454 (+143%)   220 (+402%)   1034 (+7%)    1104
utf-32be    '\u8000'+'\x80'*9999          459 (+142%)   453 (+145%)   220 (+405%)   1045 (+6%)    1112
utf-32be    '\u8000'+'\u0100'*9999        459 (+142%)   454 (+144%)   220 (+404%)   1047 (+6%)    1109
utf-32be  '\U00010000'*10000              386 (+155%)   416 (+137%)   212 (+364%)   940 (+5%)     984
utf-32be    '\U00010000'+'A'*9999         461 (+116%)   415 (+140%)   213 (+367%)   948 (+5%)     994
utf-32be    '\U00010000'+'\x80'*9999      458 (+115%)   415 (+137%)   222 (+343%)   938 (+5%)     983
utf-32be    '\U00010000'+'\u0100'*9999    458 (+115%)   415 (+137%)   212 (+364%)   940 (+5%)     983
utf-32be    '\U00010000'+'\u8000'*9999    458 (+115%)   415 (+137%)   222 (+343%)   939 (+5%)     983

On 32-bit Linux, Intel Atom N570 @ 1.66GHz:

                                          Py2.7         Py3.2         Py3.3         patchA       patchB

utf-32le  'A'*10000                       165 (+173%)   165 (+173%)   100 (+350%)   389 (+16%)    450
utf-32le  '\x80'*10000                    165 (+159%)   165 (+159%)   76 (+462%)    374 (+14%)    427
utf-32le    '\x80'+'A'*9999               165 (+161%)   165 (+161%)   76 (+466%)    374 (+15%)    430
utf-32le  '\u0100'*10000                  165 (+119%)   165 (+119%)   81 (+346%)    333 (+8%)     361
utf-32le    '\u0100'+'A'*9999             165 (+120%)   165 (+120%)   81 (+348%)    334 (+9%)     363
utf-32le    '\u0100'+'\x80'*9999          165 (+119%)   165 (+119%)   81 (+347%)    334 (+8%)     362
utf-32le  '\u8000'*10000                  165 (+119%)   165 (+119%)   80 (+352%)    333 (+9%)     362
utf-32le    '\u8000'+'A'*9999             165 (+119%)   165 (+119%)   81 (+347%)    334 (+8%)     362
utf-32le    '\u8000'+'\x80'*9999          165 (+119%)   165 (+119%)   81 (+347%)    334 (+8%)     362
utf-32le    '\u8000'+'\u0100'*9999        165 (+118%)   165 (+118%)   81 (+343%)    333 (+8%)     359
utf-32le  '\U00010000'*10000              155 (+130%)   151 (+136%)   80 (+346%)    324 (+10%)    357
utf-32le    '\U00010000'+'A'*9999         165 (+117%)   165 (+117%)   80 (+348%)    325 (+10%)    358
utf-32le    '\U00010000'+'\x80'*9999      165 (+118%)   165 (+118%)   80 (+349%)    325 (+10%)    359
utf-32le    '\U00010000'+'\u0100'*9999    165 (+116%)   165 (+116%)   80 (+346%)    324 (+10%)    357
utf-32le    '\U00010000'+'\u8000'*9999    165 (+117%)   165 (+117%)   80 (+348%)    324 (+10%)    358

utf-32be  'A'*10000                       165 (+172%)   165 (+172%)   100 (+348%)   390 (+15%)    448
utf-32be  '\x80'*10000                    165 (+159%)   165 (+159%)   75 (+469%)    373 (+14%)    427
utf-32be    '\x80'+'A'*9999               165 (+160%)   165 (+160%)   75 (+472%)    375 (+14%)    429
utf-32be  '\u0100'*10000                  165 (+119%)   165 (+119%)   81 (+347%)    334 (+8%)     362
utf-32be    '\u0100'+'A'*9999             165 (+120%)   165 (+120%)   81 (+348%)    335 (+8%)     363
utf-32be    '\u0100'+'\x80'*9999          165 (+119%)   165 (+119%)   81 (+347%)    335 (+8%)     362
utf-32be  '\u8000'*10000                  165 (+119%)   165 (+119%)   81 (+347%)    334 (+8%)     362
utf-32be    '\u8000'+'A'*9999             165 (+120%)   165 (+120%)   81 (+348%)    334 (+9%)     363
utf-32be    '\u8000'+'\x80'*9999          165 (+119%)   165 (+119%)   81 (+347%)    335 (+8%)     362
utf-32be    '\u8000'+'\u0100'*9999        165 (+118%)   165 (+118%)   81 (+344%)    335 (+7%)     360
utf-32be  '\U00010000'*10000              155 (+130%)   151 (+136%)   80 (+346%)    324 (+10%)    357
utf-32be    '\U00010000'+'A'*9999         165 (+117%)   165 (+117%)   80 (+348%)    325 (+10%)    358
utf-32be    '\U00010000'+'\x80'*9999      165 (+118%)   165 (+118%)   80 (+349%)    325 (+10%)    359
utf-32be    '\U00010000'+'\u0100'*9999    165 (+117%)   165 (+117%)   80 (+348%)    324 (+10%)    358
utf-32be    '\U00010000'+'\u8000'*9999    165 (+117%)   165 (+117%)   80 (+348%)    325 (+10%)    358

For scripts see issue14624.
msg160443 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-11 19:24
The patches updated to stylistic conformity of the UTF-8 decoder. Patch B is significantly accelerated for aligned input data (i. e. almost always), especially for natural order. The UTF-32 decoder can now be faster than ASCII decoder! May be it is time to change the title to "Amazingly faster UTF-32 decoding"? ;)

                                          Py3.2         Py3.3         patchA       patchB

utf-32le  'A'*10000                       162 (+462%)   100 (+810%)   391 (+133%)   910
utf-32le      'A'*9999+'\x80'             162 (+411%)   99 (+736%)    377 (+120%)   828
utf-32le      'A'*9999+'\u0100'           162 (+277%)   95 (+543%)    324 (+89%)    611
utf-32le      'A'*9999+'\u8000'           162 (+278%)   95 (+545%)    324 (+89%)    613
utf-32le      'A'*9999+'\U00010000'       162 (+280%)   95 (+547%)    322 (+91%)    615
utf-32le  '\x80'*10000                    162 (+436%)   94 (+823%)    389 (+123%)   868
utf-32le    '\x80'+'A'*9999               162 (+441%)   94 (+832%)    388 (+126%)   876
utf-32le      '\x80'*9999+'\u0100'        162 (+273%)   90 (+571%)    320 (+89%)    604
utf-32le      '\x80'*9999+'\u8000'        162 (+271%)   90 (+568%)    319 (+88%)    601
utf-32le      '\x80'*9999+'\U00010000'    162 (+268%)   90 (+562%)    318 (+87%)    596
utf-32le  '\u0100'*10000                  161 (+445%)   83 (+958%)    405 (+117%)   878
utf-32le    '\u0100'+'A'*9999             162 (+440%)   83 (+954%)    403 (+117%)   875
utf-32le    '\u0100'+'\x80'*9999          162 (+444%)   83 (+963%)    403 (+119%)   882
utf-32le      '\u0100'*9999+'\u8000'      162 (+441%)   83 (+955%)    404 (+117%)   876
utf-32le      '\u0100'*9999+'\U00010000'  162 (+259%)   79 (+637%)    325 (+79%)    582
utf-32le  '\u8000'*10000                  162 (+441%)   83 (+955%)    404 (+117%)   876
utf-32le    '\u8000'+'A'*9999             162 (+441%)   83 (+955%)    404 (+117%)   876
utf-32le    '\u8000'+'\x80'*9999          161 (+448%)   83 (+964%)    403 (+119%)   883
utf-32le    '\u8000'+'\u0100'*9999        161 (+443%)   83 (+954%)    402 (+118%)   875
utf-32le      '\u8000'*9999+'\U00010000'  162 (+262%)   79 (+643%)    325 (+81%)    587
utf-32le  '\U00010000'*10000              149 (+483%)   83 (+947%)    390 (+123%)   869
utf-32le    '\U00010000'+'A'*9999         162 (+444%)   83 (+963%)    389 (+127%)   882
utf-32le    '\U00010000'+'\x80'*9999      162 (+430%)   83 (+935%)    389 (+121%)   859
utf-32le    '\U00010000'+'\u0100'*9999    162 (+429%)   83 (+933%)    389 (+120%)   857
utf-32le    '\U00010000'+'\u8000'*9999    162 (+431%)   83 (+937%)    388 (+122%)   861

utf-32be  'A'*10000                       162 (+199%)   100 (+384%)   393 (+23%)    484
utf-32be      'A'*9999+'\x80'             162 (+186%)   99 (+368%)    376 (+23%)    463
utf-32be      'A'*9999+'\u0100'           162 (+138%)   95 (+306%)    323 (+20%)    386
utf-32be      'A'*9999+'\u8000'           162 (+139%)   95 (+307%)    323 (+20%)    387
utf-32be      'A'*9999+'\U00010000'       162 (+138%)   95 (+305%)    322 (+20%)    385
utf-32be  '\x80'*10000                    161 (+196%)   94 (+407%)    389 (+23%)    477
utf-32be    '\x80'+'A'*9999               161 (+197%)   94 (+409%)    387 (+24%)    478
utf-32be      '\x80'*9999+'\u0100'        161 (+137%)   90 (+324%)    321 (+19%)    382
utf-32be      '\x80'*9999+'\u8000'        162 (+135%)   89 (+328%)    320 (+19%)    381
utf-32be      '\x80'*9999+'\U00010000'    162 (+134%)   89 (+326%)    318 (+19%)    379
utf-32be  '\u0100'*10000                  161 (+196%)   83 (+473%)    404 (+18%)    476
utf-32be    '\u0100'+'A'*9999             161 (+196%)   83 (+475%)    402 (+19%)    477
utf-32be    '\u0100'+'\x80'*9999          162 (+196%)   83 (+477%)    403 (+19%)    479
utf-32be      '\u0100'*9999+'\u8000'      161 (+196%)   83 (+473%)    404 (+18%)    476
utf-32be      '\u0100'*9999+'\U00010000'  162 (+131%)   79 (+373%)    325 (+15%)    374
utf-32be  '\u8000'*10000                  161 (+195%)   83 (+472%)    404 (+18%)    475
utf-32be    '\u8000'+'A'*9999             161 (+197%)   83 (+476%)    402 (+19%)    478
utf-32be    '\u8000'+'\x80'*9999          161 (+197%)   83 (+476%)    403 (+19%)    478
utf-32be    '\u8000'+'\u0100'*9999        162 (+194%)   83 (+473%)    403 (+18%)    476
utf-32be      '\u8000'*9999+'\U00010000'  161 (+133%)   79 (+375%)    325 (+15%)    375
utf-32be  '\U00010000'*10000              148 (+222%)   83 (+473%)    391 (+22%)    476
utf-32be    '\U00010000'+'A'*9999         161 (+198%)   83 (+477%)    389 (+23%)    479
utf-32be    '\U00010000'+'\x80'*9999      162 (+194%)   83 (+473%)    389 (+22%)    476
utf-32be    '\U00010000'+'\u0100'*9999    162 (+194%)   83 (+475%)    389 (+23%)    477
utf-32be    '\U00010000'+'\u8000'*9999    161 (+196%)   83 (+475%)    389 (+23%)    477
msg173405 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-20 19:10
Patches updated to 3.4.
msg173406 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-20 19:23
I suggest apply patch A to 3.3 as it fixes performance regression (2x) and is very simple.
msg173407 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-10-20 19:38
Very simple? You're changing most of the code there.
msg173408 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-20 19:59
It was too complicated code. Actually patched code is smaller.

 1 file changed, 71 insertions(+), 80 deletions(-)

UTF-16 codec was modified in some way.
msg173409 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-10-20 20:21
That the new code is smaller is no guarantee that it's as correct :)

That is exactly the reason we don't put optimizations in bugfix releases.
msg174170 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-10-30 00:59
> I suggest apply patch A to 3.3 as it fixes performance
> regression (2x) and is very simple.

ASCII and UTF-8 are the two most common codecs in the world, so it's justified to have heavily optimized encoders and decoders.

I don't know any application using UTF-32-LE or UTF-32-BE. So I don't want to waste Python memory/code size with a heavily optimized decoder. The patch A looks to be enough.

--

32 bit units is commonly used with wchar_t, but this format already has a fast decoder, PyUnicode_FromWideChar(), which uses memcpy() or _PyUnicode_CONVERT_BYTES().
msg174188 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-30 09:29
> I don't know any application using UTF-32-LE or UTF-32-BE. So I don't want to waste Python memory/code size with a heavily optimized decoder. The patch A looks to be enough.

Agree.  I had the same doubts.  That's why I proposed two patches for your choice.
msg174233 - (view) Author: Roundup Robot (python-dev) Date: 2012-10-30 22:24
New changeset 9badfe3a31a7 by Victor Stinner in branch 'default':
Close #14625: Rewrite the UTF-32 decoder. It is now 3x to 4x faster
http://hg.python.org/cpython/rev/9badfe3a31a7
msg174234 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-10-30 22:26
I applied the patch "A" with minor changes: replace multiple goto with classic break/continue and if/else.
msg174239 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-30 23:27
> I applied the patch "A" with minor changes: replace multiple goto with
> classic break/continue and if/else.

Looks good. Thanks.
History
Date User Action Args
2012-10-30 23:27:45serhiy.storchakasetmessages: + msg174239
2012-10-30 22:26:24hayposetmessages: + msg174234
2012-10-30 22:24:09python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg174233

resolution: fixed
stage: patch review -> resolved
2012-10-30 09:29:56serhiy.storchakasetmessages: + msg174188
2012-10-30 00:59:28hayposetmessages: + msg174170
2012-10-24 09:02:12serhiy.storchakasetstage: patch review
2012-10-20 20:21:48georg.brandlsetmessages: + msg173409
2012-10-20 19:59:49serhiy.storchakasetmessages: + msg173408
2012-10-20 19:38:13georg.brandlsetmessages: + msg173407
2012-10-20 19:25:53serhiy.storchakasetnosy: + georg.brandl
2012-10-20 19:23:40serhiy.storchakasetmessages: + msg173406
versions: + Python 3.3
2012-10-20 19:11:26serhiy.storchakasetkeywords: + 3.3regression
files: + decode_utf32_b_3.patch
2012-10-20 19:10:07serhiy.storchakasetkeywords: + needs review
files: + decode_utf32_a_3.patch
messages: + msg173405

versions: + Python 3.4, - Python 3.3
2012-10-20 19:07:24serhiy.storchakasetfiles: - decode_utf32_b_2.patch
2012-10-20 19:06:51serhiy.storchakasetfiles: - decode_utf32_a_2.patch
2012-10-20 19:06:34serhiy.storchakasetfiles: - decode_utf32_b.patch
2012-10-20 19:06:21serhiy.storchakasetfiles: - decode_utf32_a.patch
2012-05-11 19:46:42serhiy.storchakasetnosy: + ezio.melotti
components: + Unicode
2012-05-11 19:25:28serhiy.storchakasetfiles: + decode_utf32_b_2.patch
2012-05-11 19:24:54serhiy.storchakasetfiles: + decode_utf32_a_2.patch

messages: + msg160443
2012-04-23 21:01:40serhiy.storchakasetmessages: + msg159078
2012-04-20 21:38:12asvetlovsetnosy: + asvetlov
2012-04-20 06:36:35Arfreversetnosy: + Arfrever
2012-04-19 21:03:20hayposetnosy: + pitrou, haypo
messages: + msg158750
2012-04-19 21:00:28serhiy.storchakasetfiles: + decode_utf32_b.patch
2012-04-19 20:59:56serhiy.storchakacreate