This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients Arfrever, asvetlov, pitrou, serhiy.storchaka, vstinner
Date 2012-05-11.19:24:53
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1336764295.4.0.606718043314.issue14625@psf.upfronthosting.co.za>
In-reply-to
Content
The patches updated to stylistic conformity of the UTF-8 decoder. Patch B is significantly accelerated for aligned input data (i. e. almost always), especially for natural order. The UTF-32 decoder can now be faster than ASCII decoder! May be it is time to change the title to "Amazingly faster UTF-32 decoding"? ;)

                                          Py3.2         Py3.3         patchA       patchB

utf-32le  'A'*10000                       162 (+462%)   100 (+810%)   391 (+133%)   910
utf-32le      'A'*9999+'\x80'             162 (+411%)   99 (+736%)    377 (+120%)   828
utf-32le      'A'*9999+'\u0100'           162 (+277%)   95 (+543%)    324 (+89%)    611
utf-32le      'A'*9999+'\u8000'           162 (+278%)   95 (+545%)    324 (+89%)    613
utf-32le      'A'*9999+'\U00010000'       162 (+280%)   95 (+547%)    322 (+91%)    615
utf-32le  '\x80'*10000                    162 (+436%)   94 (+823%)    389 (+123%)   868
utf-32le    '\x80'+'A'*9999               162 (+441%)   94 (+832%)    388 (+126%)   876
utf-32le      '\x80'*9999+'\u0100'        162 (+273%)   90 (+571%)    320 (+89%)    604
utf-32le      '\x80'*9999+'\u8000'        162 (+271%)   90 (+568%)    319 (+88%)    601
utf-32le      '\x80'*9999+'\U00010000'    162 (+268%)   90 (+562%)    318 (+87%)    596
utf-32le  '\u0100'*10000                  161 (+445%)   83 (+958%)    405 (+117%)   878
utf-32le    '\u0100'+'A'*9999             162 (+440%)   83 (+954%)    403 (+117%)   875
utf-32le    '\u0100'+'\x80'*9999          162 (+444%)   83 (+963%)    403 (+119%)   882
utf-32le      '\u0100'*9999+'\u8000'      162 (+441%)   83 (+955%)    404 (+117%)   876
utf-32le      '\u0100'*9999+'\U00010000'  162 (+259%)   79 (+637%)    325 (+79%)    582
utf-32le  '\u8000'*10000                  162 (+441%)   83 (+955%)    404 (+117%)   876
utf-32le    '\u8000'+'A'*9999             162 (+441%)   83 (+955%)    404 (+117%)   876
utf-32le    '\u8000'+'\x80'*9999          161 (+448%)   83 (+964%)    403 (+119%)   883
utf-32le    '\u8000'+'\u0100'*9999        161 (+443%)   83 (+954%)    402 (+118%)   875
utf-32le      '\u8000'*9999+'\U00010000'  162 (+262%)   79 (+643%)    325 (+81%)    587
utf-32le  '\U00010000'*10000              149 (+483%)   83 (+947%)    390 (+123%)   869
utf-32le    '\U00010000'+'A'*9999         162 (+444%)   83 (+963%)    389 (+127%)   882
utf-32le    '\U00010000'+'\x80'*9999      162 (+430%)   83 (+935%)    389 (+121%)   859
utf-32le    '\U00010000'+'\u0100'*9999    162 (+429%)   83 (+933%)    389 (+120%)   857
utf-32le    '\U00010000'+'\u8000'*9999    162 (+431%)   83 (+937%)    388 (+122%)   861

utf-32be  'A'*10000                       162 (+199%)   100 (+384%)   393 (+23%)    484
utf-32be      'A'*9999+'\x80'             162 (+186%)   99 (+368%)    376 (+23%)    463
utf-32be      'A'*9999+'\u0100'           162 (+138%)   95 (+306%)    323 (+20%)    386
utf-32be      'A'*9999+'\u8000'           162 (+139%)   95 (+307%)    323 (+20%)    387
utf-32be      'A'*9999+'\U00010000'       162 (+138%)   95 (+305%)    322 (+20%)    385
utf-32be  '\x80'*10000                    161 (+196%)   94 (+407%)    389 (+23%)    477
utf-32be    '\x80'+'A'*9999               161 (+197%)   94 (+409%)    387 (+24%)    478
utf-32be      '\x80'*9999+'\u0100'        161 (+137%)   90 (+324%)    321 (+19%)    382
utf-32be      '\x80'*9999+'\u8000'        162 (+135%)   89 (+328%)    320 (+19%)    381
utf-32be      '\x80'*9999+'\U00010000'    162 (+134%)   89 (+326%)    318 (+19%)    379
utf-32be  '\u0100'*10000                  161 (+196%)   83 (+473%)    404 (+18%)    476
utf-32be    '\u0100'+'A'*9999             161 (+196%)   83 (+475%)    402 (+19%)    477
utf-32be    '\u0100'+'\x80'*9999          162 (+196%)   83 (+477%)    403 (+19%)    479
utf-32be      '\u0100'*9999+'\u8000'      161 (+196%)   83 (+473%)    404 (+18%)    476
utf-32be      '\u0100'*9999+'\U00010000'  162 (+131%)   79 (+373%)    325 (+15%)    374
utf-32be  '\u8000'*10000                  161 (+195%)   83 (+472%)    404 (+18%)    475
utf-32be    '\u8000'+'A'*9999             161 (+197%)   83 (+476%)    402 (+19%)    478
utf-32be    '\u8000'+'\x80'*9999          161 (+197%)   83 (+476%)    403 (+19%)    478
utf-32be    '\u8000'+'\u0100'*9999        162 (+194%)   83 (+473%)    403 (+18%)    476
utf-32be      '\u8000'*9999+'\U00010000'  161 (+133%)   79 (+375%)    325 (+15%)    375
utf-32be  '\U00010000'*10000              148 (+222%)   83 (+473%)    391 (+22%)    476
utf-32be    '\U00010000'+'A'*9999         161 (+198%)   83 (+477%)    389 (+23%)    479
utf-32be    '\U00010000'+'\x80'*9999      162 (+194%)   83 (+473%)    389 (+22%)    476
utf-32be    '\U00010000'+'\u0100'*9999    162 (+194%)   83 (+475%)    389 (+23%)    477
utf-32be    '\U00010000'+'\u8000'*9999    161 (+196%)   83 (+475%)    389 (+23%)    477
History
Date User Action Args
2012-05-11 19:24:55serhiy.storchakasetrecipients: + serhiy.storchaka, pitrou, vstinner, Arfrever, asvetlov
2012-05-11 19:24:55serhiy.storchakasetmessageid: <1336764295.4.0.606718043314.issue14625@psf.upfronthosting.co.za>
2012-05-11 19:24:54serhiy.storchakalinkissue14625 messages
2012-05-11 19:24:54serhiy.storchakacreate