Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster utf-16 decoder #58829

Closed
serhiy-storchaka opened this issue Apr 19, 2012 · 15 comments
Closed

Faster utf-16 decoder #58829

serhiy-storchaka opened this issue Apr 19, 2012 · 15 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-unicode

Comments

@serhiy-storchaka
Copy link
Member

BPO 14624
Nosy @loewis, @pitrou, @vstinner, @ezio-melotti, @asvetlov, @serhiy-storchaka
Files
  • decode_utf16.patch
  • decodebench.py
  • bench-diff.py
  • decode_utf16_2.patch
  • decode_utf16_3.patch
  • decode_utf16_4.patch
  • decode_utf16_5.patch
  • decode_utf16_6.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-05-15.21:52:07.385>
    created_at = <Date 2012-04-19.20:59:00.603>
    labels = ['interpreter-core', 'expert-unicode', 'performance']
    title = 'Faster utf-16 decoder'
    updated_at = <Date 2012-05-19.09:02:56.808>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2012-05-19.09:02:56.808>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-05-15.21:52:07.385>
    closer = 'pitrou'
    components = ['Interpreter Core', 'Unicode']
    creation = <Date 2012-04-19.20:59:00.603>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['25277', '25323', '25324', '25443', '25536', '25590', '25601', '25602']
    hgrepos = []
    issue_num = 14624
    keywords = ['patch']
    message_count = 15.0
    messages = ['158748', '158751', '158753', '158772', '159077', '159090', '159847', '159858', '160442', '160572', '160672', '160766', '160768', '160769', '161100']
    nosy_count = 8.0
    nosy_names = ['loewis', 'pitrou', 'vstinner', 'ezio.melotti', 'Arfrever', 'asvetlov', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue14624'
    versions = ['Python 3.3']

    @serhiy-storchaka
    Copy link
    Member Author

    I propose a patch, which accelerates the utf-16 decoder. With PEP-393 utf-16 decoder slowed down a few times (3-4x), this patch returns the performance at the level of Python 3.2 and even higher (+10-30% over 3.2).

    In addition, it fixes a few bugs in the utf-16 decoder. Also as a side effect is possible acceleration of other decoders.

    @serhiy-storchaka serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Apr 19, 2012
    @vstinner
    Copy link
    Member

    See also bpo-14625 for UTF-32 decoder.

    @serhiy-storchaka
    Copy link
    Member Author

    See also issue bpo-14579 for utf-16 decoder bugs.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Apr 19, 2012

    Serhiy: can you please submit a contributor form?

    @serhiy-storchaka
    Copy link
    Member Author

    Here are the results of benchmarking (numbers in MB/s).

    On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz:

                                          Py2.7        Py3.2        Py3.3       patch
    

    utf-16le 'A'*10000 504 (+282%) 1905 (+1%) 565 (+241%) 1927
    utf-16le '\x80'*10000 503 (+264%) 1894 (-3%) 417 (+340%) 1833
    utf-16le '\x80'+'A'*9999 504 (+264%) 1890 (-3%) 422 (+335%) 1834
    utf-16le '\u0100'*10000 503 (+249%) 1896 (-7%) 357 (+391%) 1754
    utf-16le '\u0100'+'A'*9999 504 (+252%) 1896 (-6%) 360 (+393%) 1776
    utf-16le '\u0100'+'\x80'*9999 503 (+249%) 1890 (-7%) 357 (+392%) 1756
    utf-16le '\u8000'*10000 503 (-18%) 355 (+16%) 75 (+449%) 412
    utf-16le '\u8000'+'A'*9999 504 (+254%) 1892 (-6%) 359 (+397%) 1783
    utf-16le '\u8000'+'\x80'*9999 503 (+249%) 1896 (-7%) 357 (+392%) 1755
    utf-16le '\u8000'+'\u0100'*9999 503 (+258%) 1901 (-5%) 359 (+402%) 1802
    utf-16le '\U00010000'*10000 484 (-14%) 379 (+9%) 103 (+303%) 415
    utf-16le '\U00010000'+'A'*9999 504 (+244%) 1905 (-9%) 353 (+392%) 1735
    utf-16le '\U00010000'+'\x80'*9999 503 (+245%) 1899 (-9%) 348 (+398%) 1733
    utf-16le '\U00010000'+'\u0100'*9999 503 (+244%) 1882 (-8%) 348 (+397%) 1729
    utf-16le '\U00010000'+'\u8000'*9999 503 (-18%) 355 (+16%) 71 (+482%) 413

    utf-16be 'A'*10000 504 (+284%) 1553 (+24%) 469 (+312%) 1933
    utf-16be '\x80'*10000 504 (+251%) 1551 (+14%) 387 (+357%) 1770
    utf-16be '\x80'+'A'*9999 504 (+261%) 1549 (+17%) 386 (+371%) 1819
    utf-16be '\u0100'*10000 503 (+175%) 1544 (-10%) 333 (+316%) 1384
    utf-16be '\u0100'+'A'*9999 505 (+178%) 1548 (-9%) 335 (+319%) 1403
    utf-16be '\u0100'+'\x80'*9999 503 (+179%) 1552 (-9%) 336 (+318%) 1405
    utf-16be '\u8000'*10000 503 (-2%) 415 (+19%) 75 (+559%) 494
    utf-16be '\u8000'+'A'*9999 504 (+179%) 1551 (-9%) 335 (+320%) 1408
    utf-16be '\u8000'+'\x80'*9999 504 (+178%) 1551 (-10%) 336 (+317%) 1402
    utf-16be '\u8000'+'\u0100'*9999 504 (+179%) 1549 (-9%) 336 (+318%) 1404
    utf-16be '\U00010000'*10000 483 (-7%) 407 (+10%) 105 (+326%) 447
    utf-16be '\U00010000'+'A'*9999 504 (+149%) 1554 (-19%) 317 (+295%) 1253
    utf-16be '\U00010000'+'\x80'*9999 503 (+153%) 1543 (-17%) 317 (+302%) 1275
    utf-16be '\U00010000'+'\u0100'*9999 503 (+153%) 1537 (-17%) 317 (+302%) 1274
    utf-16be '\U00010000'+'\u8000'*9999 503 (-2%) 415 (+19%) 71 (+597%) 495

    On 32-bit Linux, Intel Atom N570 @ 1.66GHz:

                                          Py2.7        Py3.2        Py3.3       patch
    

    utf-16le 'A'*10000 136 (+417%) 584 (+20%) 184 (+282%) 703
    utf-16le '\x80'*10000 136 (+392%) 580 (+15%) 160 (+318%) 669
    utf-16le '\x80'+'A'*9999 136 (+398%) 582 (+16%) 159 (+326%) 677
    utf-16le '\u0100'*10000 137 (+346%) 583 (+5%) 129 (+374%) 611
    utf-16le '\u0100'+'A'*9999 136 (+358%) 582 (+7%) 129 (+383%) 623
    utf-16le '\u0100'+'\x80'*9999 136 (+348%) 580 (+5%) 129 (+372%) 609
    utf-16le '\u8000'*10000 136 (+18%) 127 (+27%) 38 (+324%) 161
    utf-16le '\u8000'+'A'*9999 136 (+357%) 582 (+7%) 129 (+382%) 622
    utf-16le '\u8000'+'\x80'*9999 136 (+351%) 581 (+6%) 128 (+380%) 614
    utf-16le '\u8000'+'\u0100'*9999 136 (+349%) 581 (+5%) 129 (+374%) 611
    utf-16le '\U00010000'*10000 153 (-3%) 140 (+6%) 53 (+181%) 149
    utf-16le '\U00010000'+'A'*9999 136 (+296%) 581 (-7%) 131 (+311%) 538
    utf-16le '\U00010000'+'\x80'*9999 136 (+289%) 584 (-9%) 131 (+304%) 529
    utf-16le '\U00010000'+'\u0100'*9999 136 (+290%) 579 (-8%) 130 (+308%) 530
    utf-16le '\U00010000'+'\u8000'*9999 136 (+25%) 128 (+33%) 38 (+347%) 170

    utf-16be 'A'*10000 136 (+331%) 441 (+33%) 166 (+253%) 586
    utf-16be '\x80'*10000 136 (+309%) 440 (+26%) 145 (+283%) 556
    utf-16be '\x80'+'A'*9999 136 (+312%) 442 (+27%) 145 (+286%) 560
    utf-16be '\u0100'*10000 136 (+231%) 441 (+2%) 120 (+275%) 450
    utf-16be '\u0100'+'A'*9999 136 (+232%) 442 (+2%) 120 (+276%) 451
    utf-16be '\u0100'+'\x80'*9999 136 (+231%) 438 (+3%) 119 (+278%) 450
    utf-16be '\u8000'*10000 136 (+22%) 127 (+31%) 38 (+337%) 166
    utf-16be '\u8000'+'A'*9999 136 (+232%) 439 (+3%) 120 (+276%) 451
    utf-16be '\u8000'+'\x80'*9999 136 (+230%) 439 (+2%) 120 (+274%) 449
    utf-16be '\u8000'+'\u0100'*9999 136 (+232%) 439 (+3%) 120 (+276%) 451
    utf-16be '\U00010000'*10000 153 (-1%) 139 (+9%) 52 (+192%) 152
    utf-16be '\U00010000'+'A'*9999 136 (+211%) 440 (-4%) 121 (+250%) 423
    utf-16be '\U00010000'+'\x80'*9999 136 (+210%) 440 (-4%) 122 (+246%) 422
    utf-16be '\U00010000'+'\u0100'*9999 136 (+210%) 441 (-5%) 121 (+248%) 421
    utf-16be '\U00010000'+'\u8000'*9999 136 (+27%) 128 (+35%) 38 (+355%) 173

    @pitrou
    Copy link
    Member

    pitrou commented Apr 23, 2012

    64 bit Linux, Intel Core i5-2500K @ 3.30GHz:

                                          vanilla 3.3   patched
    

    utf-16le 'A'*10000 1384 (+278%) 5233
    utf-16le 'A'*9999+'\x80' 1303 (+259%) 4684
    utf-16le 'A'*9999+'\u0100' 953 (+195%) 2813
    utf-16le 'A'*9999+'\u8000' 953 (+195%) 2814
    utf-16le 'A'*9999+'\U00010000' 979 (+197%) 2903
    utf-16le '\x80'*10000 1243 (+321%) 5230
    utf-16le '\x80'+'A'*9999 1256 (+313%) 5188
    utf-16le '\x80'*9999+'\u0100' 880 (+214%) 2765
    utf-16le '\x80'*9999+'\u8000' 880 (+214%) 2763
    utf-16le '\x80'*9999+'\U00010000' 899 (+218%) 2860
    utf-16le '\u0100'*10000 1047 (+370%) 4917
    utf-16le '\u0100'+'A'*9999 1046 (+369%) 4906
    utf-16le '\u0100'+'\x80'*9999 1047 (+370%) 4920
    utf-16le '\u0100'*9999+'\u8000' 1047 (+369%) 4906
    utf-16le '\u0100'*9999+'\U00010000' 791 (+253%) 2793
    utf-16le '\u8000'*10000 230 (+410%) 1173
    utf-16le '\u8000'+'A'*9999 1043 (+371%) 4911
    utf-16le '\u8000'+'\x80'*9999 1044 (+345%) 4645
    utf-16le '\u8000'+'\u0100'*9999 1041 (+350%) 4681
    utf-16le '\u8000'*9999+'\U00010000' 215 (+357%) 983
    utf-16le '\U00010000'*10000 362 (+170%) 976
    utf-16le '\U00010000'+'A'*9999 985 (+210%) 3052
    utf-16le '\U00010000'+'\x80'*9999 985 (+211%) 3066
    utf-16le '\U00010000'+'\u0100'*9999 983 (+209%) 3042
    utf-16le '\U00010000'+'\u8000'*9999 245 (+329%) 1052

    utf-16be 'A'*10000 1268 (+313%) 5240
    utf-16be 'A'*9999+'\x80' 1199 (+297%) 4758
    utf-16be 'A'*9999+'\u0100' 896 (+211%) 2786
    utf-16be 'A'*9999+'\u8000' 897 (+211%) 2788
    utf-16be 'A'*9999+'\U00010000' 919 (+214%) 2885
    utf-16be '\x80'*10000 1154 (+341%) 5087
    utf-16be '\x80'+'A'*9999 1155 (+343%) 5112
    utf-16be '\x80'*9999+'\u0100' 829 (+229%) 2728
    utf-16be '\x80'*9999+'\u8000' 828 (+229%) 2726
    utf-16be '\x80'*9999+'\U00010000' 852 (+232%) 2832
    utf-16be '\u0100'*10000 981 (+332%) 4241
    utf-16be '\u0100'+'A'*9999 981 (+330%) 4220
    utf-16be '\u0100'+'\x80'*9999 977 (+331%) 4213
    utf-16be '\u0100'*9999+'\u8000' 982 (+331%) 4237
    utf-16be '\u0100'*9999+'\U00010000' 748 (+237%) 2520
    utf-16be '\u8000'*10000 230 (+413%) 1180
    utf-16be '\u8000'+'A'*9999 979 (+331%) 4218
    utf-16be '\u8000'+'\x80'*9999 974 (+333%) 4215
    utf-16be '\u8000'+'\u0100'*9999 972 (+335%) 4226
    utf-16be '\u8000'*9999+'\U00010000' 215 (+361%) 992
    utf-16be '\U00010000'*10000 362 (+170%) 978
    utf-16be '\U00010000'+'A'*9999 924 (+232%) 3064
    utf-16be '\U00010000'+'\x80'*9999 921 (+223%) 2979
    utf-16be '\U00010000'+'\u0100'*9999 921 (+233%) 3064
    utf-16be '\U00010000'+'\u8000'*9999 245 (+329%) 1052

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 3, 2012

    New changeset 830eeff4fe8f by Victor Stinner in branch 'default':
    Issue bpo-14624, bpo-14687: Optimize unicode_widen()
    http://hg.python.org/cpython/rev/830eeff4fe8f

    @serhiy-storchaka
    Copy link
    Member Author

    Here is updated patch, taking into account that unicode_widen is already
    optimized.

    @serhiy-storchaka
    Copy link
    Member Author

    The patch updated to stylistic conformity of the UTF-8 decoder. The decoding of the UCS2 non-surrogate characters a little speed up (+15%).

    @pitrou
    Copy link
    Member

    pitrou commented May 13, 2012

    New performance figures under 64 bit Linux, Intel Core i5-2500K @ 3.30GHz:

                                          vanilla 3.3   patched
    

    utf-16le 'A'*10000 1411 (+290%) 5504
    utf-16le 'A'*9999+'\x80' 1368 (+263%) 4970
    utf-16le 'A'*9999+'\u0100' 1145 (+151%) 2871
    utf-16le 'A'*9999+'\u8000' 1144 (+151%) 2870
    utf-16le 'A'*9999+'\U00010000' 1164 (+154%) 2957
    utf-16le '\x80'*10000 1403 (+271%) 5209
    utf-16le '\x80'+'A'*9999 1406 (+272%) 5235
    utf-16le '\x80'*9999+'\u0100' 1138 (+138%) 2713
    utf-16le '\x80'*9999+'\u8000' 1138 (+139%) 2716
    utf-16le '\x80'*9999+'\U00010000' 1155 (+151%) 2897
    utf-16le '\u0100'*10000 1477 (+243%) 5062
    utf-16le '\u0100'+'A'*9999 1478 (+243%) 5072
    utf-16le '\u0100'+'\x80'*9999 1477 (+243%) 5062
    utf-16le '\u0100'*9999+'\u8000' 1478 (+242%) 5055
    utf-16le '\u0100'*9999+'\U00010000' 1201 (+131%) 2776
    utf-16le '\u8000'*10000 246 (+347%) 1100
    utf-16le '\u8000'+'A'*9999 1475 (+244%) 5069
    utf-16le '\u8000'+'\x80'*9999 1474 (+243%) 5062
    utf-16le '\u8000'+'\u0100'*9999 1473 (+243%) 5057
    utf-16le '\u8000'*9999+'\U00010000' 236 (+295%) 932
    utf-16le '\U00010000'*10000 393 (+164%) 1039
    utf-16le '\U00010000'+'A'*9999 1325 (+134%) 3106
    utf-16le '\U00010000'+'\x80'*9999 1326 (+134%) 3103
    utf-16le '\U00010000'+'\u0100'*9999 1326 (+134%) 3104
    utf-16le '\U00010000'+'\u8000'*9999 253 (+331%) 1091

    utf-16be 'A'*10000 1341 (+298%) 5342
    utf-16be 'A'*9999+'\x80' 1305 (+275%) 4888
    utf-16be 'A'*9999+'\u0100' 1101 (+157%) 2834
    utf-16be 'A'*9999+'\u8000' 1102 (+157%) 2831
    utf-16be 'A'*9999+'\U00010000' 1115 (+162%) 2917
    utf-16be '\x80'*10000 1326 (+296%) 5253
    utf-16be '\x80'+'A'*9999 1322 (+298%) 5258
    utf-16be '\x80'*9999+'\u0100' 1088 (+156%) 2781
    utf-16be '\x80'*9999+'\u8000' 1088 (+155%) 2770
    utf-16be '\x80'*9999+'\U00010000' 1103 (+159%) 2854
    utf-16be '\u0100'*10000 1344 (+221%) 4308
    utf-16be '\u0100'+'A'*9999 1342 (+223%) 4330
    utf-16be '\u0100'+'\x80'*9999 1343 (+221%) 4307
    utf-16be '\u0100'*9999+'\u8000' 1343 (+221%) 4306
    utf-16be '\u0100'*9999+'\U00010000' 1109 (+128%) 2529
    utf-16be '\u8000'*10000 248 (+341%) 1094
    utf-16be '\u8000'+'A'*9999 1340 (+223%) 4331
    utf-16be '\u8000'+'\x80'*9999 1341 (+221%) 4307
    utf-16be '\u8000'+'\u0100'*9999 1341 (+221%) 4309
    utf-16be '\u8000'*9999+'\U00010000' 239 (+290%) 931
    utf-16be '\U00010000'*10000 399 (+160%) 1037
    utf-16be '\U00010000'+'A'*9999 1230 (+152%) 3101
    utf-16be '\U00010000'+'\x80'*9999 1218 (+154%) 3095
    utf-16be '\U00010000'+'\u0100'*9999 1220 (+154%) 3095
    utf-16be '\U00010000'+'\u8000'*9999 257 (+318%) 1074

    @serhiy-storchaka
    Copy link
    Member Author

    The patch updated with a little clarified code and added comments.

    @serhiy-storchaka
    Copy link
    Member Author

    Here are two new patch. Checking for characters out-of-range moved,
    making the code simpler. Theoretically it is a bit slow down decoding of
    short UCS1 strings (up to 1 and 3 chars on 32- and 64-bit), but
    practically there is no difference. The second patch is different from
    the first patch that masks are not calculated and specified explicitly.
    I am not sure that it improves readability. The commiter has the choice.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 15, 2012

    New changeset cdcc816dea85 by Antoine Pitrou in branch 'default':
    Issue bpo-14624: UTF-16 decoding is now 3x to 4x faster on various inputs.
    http://hg.python.org/cpython/rev/cdcc816dea85

    @pitrou
    Copy link
    Member

    pitrou commented May 15, 2012

    Thank you Serhiy! Now committed.

    @pitrou pitrou closed this as completed May 15, 2012
    @serhiy-storchaka
    Copy link
    Member Author

    Thank you, Antoine. Now only bpo-14625 waits for review.

    changeset: 77012:3430d7329a3b
    +* UTF-8 and UTF-16 decoding is now 2x to 4x faster.

    In fact now UTF-16 decoding faster for a maximum of +25% compared to Python 3.2 on my computers (and sometimes a little slower yet). 2x to 4x it is faster compared to former slow-downed Python 3.3 (thanks to PEP-393).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-unicode
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants