Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster utf-32 decoder #58830

Closed
serhiy-storchaka opened this issue Apr 19, 2012 · 14 comments
Closed

Faster utf-32 decoder #58830

serhiy-storchaka opened this issue Apr 19, 2012 · 14 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-unicode

Comments

@serhiy-storchaka
Copy link
Member

BPO 14625
Nosy @birkenfeld, @pitrou, @vstinner, @ezio-melotti, @asvetlov, @serhiy-storchaka
Files
  • decode_utf32_a_3.patch
  • decode_utf32_b_3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-10-30.22:24:09.867>
    created_at = <Date 2012-04-19.20:59:56.752>
    labels = ['interpreter-core', 'expert-unicode', 'performance']
    title = 'Faster utf-32 decoder'
    updated_at = <Date 2012-10-30.23:27:45.670>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2012-10-30.23:27:45.670>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-10-30.22:24:09.867>
    closer = 'python-dev'
    components = ['Interpreter Core', 'Unicode']
    creation = <Date 2012-04-19.20:59:56.752>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['27638', '27639']
    hgrepos = []
    issue_num = 14625
    keywords = ['patch', 'needs review', '3.3regression']
    message_count = 14.0
    messages = ['158749', '158750', '159078', '160443', '173405', '173406', '173407', '173408', '173409', '174170', '174188', '174233', '174234', '174239']
    nosy_count = 8.0
    nosy_names = ['georg.brandl', 'pitrou', 'vstinner', 'ezio.melotti', 'Arfrever', 'asvetlov', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue14625'
    versions = ['Python 3.3', 'Python 3.4']

    @serhiy-storchaka
    Copy link
    Member Author

    I suggest two variants of patch, accelerating the utf-32 decoder. With PEP-393 utf-32 decoder slowed down up to 2x, these patches returns a performance at the level of Python 3.2 and even much higher (2-3x over 3.2). The variant A is simpler, but the variant B is a little faster (+8-15%).

    @serhiy-storchaka serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Apr 19, 2012
    @vstinner
    Copy link
    Member

    See also bpo-14624 for UTF-16 decoder.

    @serhiy-storchaka
    Copy link
    Member Author

    Here are the results of benchmarking (numbers in MB/s).

    On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz:

                                          Py2.7         Py3.2         Py3.3         patchA       patchB
    

    utf-32le 'A'*10000 461 (+215%) 454 (+220%) 292 (+398%) 1213 (+20%) 1454
    utf-32le '\x80'*10000 458 (+177%) 454 (+180%) 271 (+369%) 1124 (+13%) 1270
    utf-32le '\x80'+'A'*9999 462 (+171%) 454 (+175%) 271 (+361%) 1111 (+13%) 1250
    utf-32le '\u0100'*10000 457 (+133%) 454 (+135%) 220 (+385%) 1002 (+6%) 1067
    utf-32le '\u0100'+'A'*9999 461 (+129%) 454 (+132%) 220 (+379%) 993 (+6%) 1054
    utf-32le '\u0100'+'\x80'*9999 458 (+131%) 454 (+133%) 220 (+381%) 1002 (+6%) 1059
    utf-32le '\u8000'*10000 457 (+133%) 454 (+135%) 220 (+385%) 1002 (+6%) 1066
    utf-32le '\u8000'+'A'*9999 462 (+128%) 454 (+132%) 220 (+379%) 994 (+6%) 1053
    utf-32le '\u8000'+'\x80'*9999 457 (+132%) 454 (+134%) 220 (+382%) 1000 (+6%) 1061
    utf-32le '\u8000'+'\u0100'*9999 457 (+132%) 454 (+134%) 220 (+382%) 1002 (+6%) 1061
    utf-32le '\U00010000'*10000 386 (+167%) 416 (+148%) 212 (+386%) 930 (+11%) 1031
    utf-32le '\U00010000'+'A'*9999 461 (+126%) 415 (+152%) 222 (+370%) 940 (+11%) 1044
    utf-32le '\U00010000'+'\x80'*9999 458 (+125%) 415 (+148%) 222 (+364%) 930 (+11%) 1031
    utf-32le '\U00010000'+'\u0100'*9999 458 (+125%) 415 (+148%) 212 (+386%) 930 (+11%) 1031
    utf-32le '\U00010000'+'\u8000'*9999 458 (+125%) 415 (+149%) 222 (+365%) 930 (+11%) 1032

    utf-32be 'A'*10000 461 (+216%) 454 (+221%) 292 (+399%) 1209 (+20%) 1456
    utf-32be '\x80'*10000 457 (+177%) 454 (+179%) 271 (+368%) 1125 (+13%) 1268
    utf-32be '\x80'+'A'*9999 462 (+171%) 453 (+176%) 271 (+362%) 1112 (+12%) 1251
    utf-32be '\u0100'*10000 457 (+144%) 453 (+146%) 220 (+407%) 1048 (+6%) 1116
    utf-32be '\u0100'+'A'*9999 462 (+139%) 454 (+143%) 220 (+402%) 1034 (+7%) 1104
    utf-32be '\u0100'+'\x80'*9999 459 (+142%) 453 (+145%) 220 (+405%) 1047 (+6%) 1112
    utf-32be '\u8000'*10000 457 (+144%) 453 (+147%) 220 (+408%) 1046 (+7%) 1117
    utf-32be '\u8000'+'A'*9999 462 (+139%) 454 (+143%) 220 (+402%) 1034 (+7%) 1104
    utf-32be '\u8000'+'\x80'*9999 459 (+142%) 453 (+145%) 220 (+405%) 1045 (+6%) 1112
    utf-32be '\u8000'+'\u0100'*9999 459 (+142%) 454 (+144%) 220 (+404%) 1047 (+6%) 1109
    utf-32be '\U00010000'*10000 386 (+155%) 416 (+137%) 212 (+364%) 940 (+5%) 984
    utf-32be '\U00010000'+'A'*9999 461 (+116%) 415 (+140%) 213 (+367%) 948 (+5%) 994
    utf-32be '\U00010000'+'\x80'*9999 458 (+115%) 415 (+137%) 222 (+343%) 938 (+5%) 983
    utf-32be '\U00010000'+'\u0100'*9999 458 (+115%) 415 (+137%) 212 (+364%) 940 (+5%) 983
    utf-32be '\U00010000'+'\u8000'*9999 458 (+115%) 415 (+137%) 222 (+343%) 939 (+5%) 983

    On 32-bit Linux, Intel Atom N570 @ 1.66GHz:

                                          Py2.7         Py3.2         Py3.3         patchA       patchB
    

    utf-32le 'A'*10000 165 (+173%) 165 (+173%) 100 (+350%) 389 (+16%) 450
    utf-32le '\x80'*10000 165 (+159%) 165 (+159%) 76 (+462%) 374 (+14%) 427
    utf-32le '\x80'+'A'*9999 165 (+161%) 165 (+161%) 76 (+466%) 374 (+15%) 430
    utf-32le '\u0100'*10000 165 (+119%) 165 (+119%) 81 (+346%) 333 (+8%) 361
    utf-32le '\u0100'+'A'*9999 165 (+120%) 165 (+120%) 81 (+348%) 334 (+9%) 363
    utf-32le '\u0100'+'\x80'*9999 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362
    utf-32le '\u8000'*10000 165 (+119%) 165 (+119%) 80 (+352%) 333 (+9%) 362
    utf-32le '\u8000'+'A'*9999 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362
    utf-32le '\u8000'+'\x80'*9999 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362
    utf-32le '\u8000'+'\u0100'*9999 165 (+118%) 165 (+118%) 81 (+343%) 333 (+8%) 359
    utf-32le '\U00010000'*10000 155 (+130%) 151 (+136%) 80 (+346%) 324 (+10%) 357
    utf-32le '\U00010000'+'A'*9999 165 (+117%) 165 (+117%) 80 (+348%) 325 (+10%) 358
    utf-32le '\U00010000'+'\x80'*9999 165 (+118%) 165 (+118%) 80 (+349%) 325 (+10%) 359
    utf-32le '\U00010000'+'\u0100'*9999 165 (+116%) 165 (+116%) 80 (+346%) 324 (+10%) 357
    utf-32le '\U00010000'+'\u8000'*9999 165 (+117%) 165 (+117%) 80 (+348%) 324 (+10%) 358

    utf-32be 'A'*10000 165 (+172%) 165 (+172%) 100 (+348%) 390 (+15%) 448
    utf-32be '\x80'*10000 165 (+159%) 165 (+159%) 75 (+469%) 373 (+14%) 427
    utf-32be '\x80'+'A'*9999 165 (+160%) 165 (+160%) 75 (+472%) 375 (+14%) 429
    utf-32be '\u0100'*10000 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362
    utf-32be '\u0100'+'A'*9999 165 (+120%) 165 (+120%) 81 (+348%) 335 (+8%) 363
    utf-32be '\u0100'+'\x80'*9999 165 (+119%) 165 (+119%) 81 (+347%) 335 (+8%) 362
    utf-32be '\u8000'*10000 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362
    utf-32be '\u8000'+'A'*9999 165 (+120%) 165 (+120%) 81 (+348%) 334 (+9%) 363
    utf-32be '\u8000'+'\x80'*9999 165 (+119%) 165 (+119%) 81 (+347%) 335 (+8%) 362
    utf-32be '\u8000'+'\u0100'*9999 165 (+118%) 165 (+118%) 81 (+344%) 335 (+7%) 360
    utf-32be '\U00010000'*10000 155 (+130%) 151 (+136%) 80 (+346%) 324 (+10%) 357
    utf-32be '\U00010000'+'A'*9999 165 (+117%) 165 (+117%) 80 (+348%) 325 (+10%) 358
    utf-32be '\U00010000'+'\x80'*9999 165 (+118%) 165 (+118%) 80 (+349%) 325 (+10%) 359
    utf-32be '\U00010000'+'\u0100'*9999 165 (+117%) 165 (+117%) 80 (+348%) 324 (+10%) 358
    utf-32be '\U00010000'+'\u8000'*9999 165 (+117%) 165 (+117%) 80 (+348%) 325 (+10%) 358

    For scripts see bpo-14624.

    @serhiy-storchaka
    Copy link
    Member Author

    The patches updated to stylistic conformity of the UTF-8 decoder. Patch B is significantly accelerated for aligned input data (i. e. almost always), especially for natural order. The UTF-32 decoder can now be faster than ASCII decoder! May be it is time to change the title to "Amazingly faster UTF-32 decoding"? ;)

                                          Py3.2         Py3.3         patchA       patchB
    

    utf-32le 'A'*10000 162 (+462%) 100 (+810%) 391 (+133%) 910
    utf-32le 'A'*9999+'\x80' 162 (+411%) 99 (+736%) 377 (+120%) 828
    utf-32le 'A'*9999+'\u0100' 162 (+277%) 95 (+543%) 324 (+89%) 611
    utf-32le 'A'*9999+'\u8000' 162 (+278%) 95 (+545%) 324 (+89%) 613
    utf-32le 'A'*9999+'\U00010000' 162 (+280%) 95 (+547%) 322 (+91%) 615
    utf-32le '\x80'*10000 162 (+436%) 94 (+823%) 389 (+123%) 868
    utf-32le '\x80'+'A'*9999 162 (+441%) 94 (+832%) 388 (+126%) 876
    utf-32le '\x80'*9999+'\u0100' 162 (+273%) 90 (+571%) 320 (+89%) 604
    utf-32le '\x80'*9999+'\u8000' 162 (+271%) 90 (+568%) 319 (+88%) 601
    utf-32le '\x80'*9999+'\U00010000' 162 (+268%) 90 (+562%) 318 (+87%) 596
    utf-32le '\u0100'*10000 161 (+445%) 83 (+958%) 405 (+117%) 878
    utf-32le '\u0100'+'A'*9999 162 (+440%) 83 (+954%) 403 (+117%) 875
    utf-32le '\u0100'+'\x80'*9999 162 (+444%) 83 (+963%) 403 (+119%) 882
    utf-32le '\u0100'*9999+'\u8000' 162 (+441%) 83 (+955%) 404 (+117%) 876
    utf-32le '\u0100'*9999+'\U00010000' 162 (+259%) 79 (+637%) 325 (+79%) 582
    utf-32le '\u8000'*10000 162 (+441%) 83 (+955%) 404 (+117%) 876
    utf-32le '\u8000'+'A'*9999 162 (+441%) 83 (+955%) 404 (+117%) 876
    utf-32le '\u8000'+'\x80'*9999 161 (+448%) 83 (+964%) 403 (+119%) 883
    utf-32le '\u8000'+'\u0100'*9999 161 (+443%) 83 (+954%) 402 (+118%) 875
    utf-32le '\u8000'*9999+'\U00010000' 162 (+262%) 79 (+643%) 325 (+81%) 587
    utf-32le '\U00010000'*10000 149 (+483%) 83 (+947%) 390 (+123%) 869
    utf-32le '\U00010000'+'A'*9999 162 (+444%) 83 (+963%) 389 (+127%) 882
    utf-32le '\U00010000'+'\x80'*9999 162 (+430%) 83 (+935%) 389 (+121%) 859
    utf-32le '\U00010000'+'\u0100'*9999 162 (+429%) 83 (+933%) 389 (+120%) 857
    utf-32le '\U00010000'+'\u8000'*9999 162 (+431%) 83 (+937%) 388 (+122%) 861

    utf-32be 'A'*10000 162 (+199%) 100 (+384%) 393 (+23%) 484
    utf-32be 'A'*9999+'\x80' 162 (+186%) 99 (+368%) 376 (+23%) 463
    utf-32be 'A'*9999+'\u0100' 162 (+138%) 95 (+306%) 323 (+20%) 386
    utf-32be 'A'*9999+'\u8000' 162 (+139%) 95 (+307%) 323 (+20%) 387
    utf-32be 'A'*9999+'\U00010000' 162 (+138%) 95 (+305%) 322 (+20%) 385
    utf-32be '\x80'*10000 161 (+196%) 94 (+407%) 389 (+23%) 477
    utf-32be '\x80'+'A'*9999 161 (+197%) 94 (+409%) 387 (+24%) 478
    utf-32be '\x80'*9999+'\u0100' 161 (+137%) 90 (+324%) 321 (+19%) 382
    utf-32be '\x80'*9999+'\u8000' 162 (+135%) 89 (+328%) 320 (+19%) 381
    utf-32be '\x80'*9999+'\U00010000' 162 (+134%) 89 (+326%) 318 (+19%) 379
    utf-32be '\u0100'*10000 161 (+196%) 83 (+473%) 404 (+18%) 476
    utf-32be '\u0100'+'A'*9999 161 (+196%) 83 (+475%) 402 (+19%) 477
    utf-32be '\u0100'+'\x80'*9999 162 (+196%) 83 (+477%) 403 (+19%) 479
    utf-32be '\u0100'*9999+'\u8000' 161 (+196%) 83 (+473%) 404 (+18%) 476
    utf-32be '\u0100'*9999+'\U00010000' 162 (+131%) 79 (+373%) 325 (+15%) 374
    utf-32be '\u8000'*10000 161 (+195%) 83 (+472%) 404 (+18%) 475
    utf-32be '\u8000'+'A'*9999 161 (+197%) 83 (+476%) 402 (+19%) 478
    utf-32be '\u8000'+'\x80'*9999 161 (+197%) 83 (+476%) 403 (+19%) 478
    utf-32be '\u8000'+'\u0100'*9999 162 (+194%) 83 (+473%) 403 (+18%) 476
    utf-32be '\u8000'*9999+'\U00010000' 161 (+133%) 79 (+375%) 325 (+15%) 375
    utf-32be '\U00010000'*10000 148 (+222%) 83 (+473%) 391 (+22%) 476
    utf-32be '\U00010000'+'A'*9999 161 (+198%) 83 (+477%) 389 (+23%) 479
    utf-32be '\U00010000'+'\x80'*9999 162 (+194%) 83 (+473%) 389 (+22%) 476
    utf-32be '\U00010000'+'\u0100'*9999 162 (+194%) 83 (+475%) 389 (+23%) 477
    utf-32be '\U00010000'+'\u8000'*9999 161 (+196%) 83 (+475%) 389 (+23%) 477

    @serhiy-storchaka
    Copy link
    Member Author

    Patches updated to 3.4.

    @serhiy-storchaka
    Copy link
    Member Author

    I suggest apply patch A to 3.3 as it fixes performance regression (2x) and is very simple.

    @birkenfeld
    Copy link
    Member

    Very simple? You're changing most of the code there.

    @serhiy-storchaka
    Copy link
    Member Author

    It was too complicated code. Actually patched code is smaller.

    1 file changed, 71 insertions(+), 80 deletions(-)

    UTF-16 codec was modified in some way.

    @birkenfeld
    Copy link
    Member

    That the new code is smaller is no guarantee that it's as correct :)

    That is exactly the reason we don't put optimizations in bugfix releases.

    @vstinner
    Copy link
    Member

    I suggest apply patch A to 3.3 as it fixes performance
    regression (2x) and is very simple.

    ASCII and UTF-8 are the two most common codecs in the world, so it's justified to have heavily optimized encoders and decoders.

    I don't know any application using UTF-32-LE or UTF-32-BE. So I don't want to waste Python memory/code size with a heavily optimized decoder. The patch A looks to be enough.

    --

    32 bit units is commonly used with wchar_t, but this format already has a fast decoder, PyUnicode_FromWideChar(), which uses memcpy() or _PyUnicode_CONVERT_BYTES().

    @serhiy-storchaka
    Copy link
    Member Author

    I don't know any application using UTF-32-LE or UTF-32-BE. So I don't want to waste Python memory/code size with a heavily optimized decoder. The patch A looks to be enough.

    Agree. I had the same doubts. That's why I proposed two patches for your choice.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 30, 2012

    New changeset 9badfe3a31a7 by Victor Stinner in branch 'default':
    Close bpo-14625: Rewrite the UTF-32 decoder. It is now 3x to 4x faster
    http://hg.python.org/cpython/rev/9badfe3a31a7

    @python-dev python-dev mannequin closed this as completed Oct 30, 2012
    @vstinner
    Copy link
    Member

    I applied the patch "A" with minor changes: replace multiple goto with classic break/continue and if/else.

    @serhiy-storchaka
    Copy link
    Member Author

    I applied the patch "A" with minor changes: replace multiple goto with
    classic break/continue and if/else.

    Looks good. Thanks.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-unicode
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants