This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients Arfrever, ezio.melotti, janssen, jcea, loewis, mark.dickinson, ned.deily, pitrou, python-dev, ronaldoussoren, serhiy.storchaka, vstinner
Date 2012-05-26.09:11:06
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1338023467.99.0.0909826635798.issue14923@psf.upfronthosting.co.za>
In-reply-to
Content
As strange as it may seem, but using a simple trick was made UTF-8 decoding even more speed up.

Here are the benchmark results.

On 32-bit Linux, AMD Athlon 64 X2:

                                          vanilla      patched

utf-8     'A'*10000                       2061 (+3%)   2115
utf-8     '\x80'*10000                    383 (-7%)    355
utf-8       '\x80'+'A'*9999               1273 (+1%)   1290
utf-8     '\u0100'*10000                  382 (+47%)   562
utf-8       '\u0100'+'A'*9999             1239 (+1%)   1253
utf-8       '\u0100'+'\x80'*9999          383 (+47%)   562
utf-8     '\u8000'*10000                  434 (-6%)    409
utf-8       '\u8000'+'A'*9999             1245 (+1%)   1256
utf-8       '\u8000'+'\x80'*9999          382 (+47%)   560
utf-8       '\u8000'+'\u0100'*9999        383 (+44%)   553
utf-8     '\U00010000'*10000              358 (+4%)    373
utf-8       '\U00010000'+'A'*9999         1171 (+0%)   1176
utf-8       '\U00010000'+'\x80'*9999      381 (+44%)   548
utf-8       '\U00010000'+'\u0100'*9999    381 (+44%)   548
utf-8       '\U00010000'+'\u8000'*9999    404 (+0%)    406

On 32-bit Linux, Intel Atom N570:

                                          vanilla      patched

utf-8     'A'*10000                       623 (+0%)    626
utf-8     '\x80'*10000                    145 (+15%)   167
utf-8       '\x80'+'A'*9999               354 (+2%)    362
utf-8     '\u0100'*10000                  164 (+10%)   181
utf-8       '\u0100'+'A'*9999             343 (-0%)    342
utf-8       '\u0100'+'\x80'*9999          164 (+11%)   182
utf-8     '\u8000'*10000                  175 (+5%)    183
utf-8       '\u8000'+'A'*9999             349 (+0%)    349
utf-8       '\u8000'+'\x80'*9999          164 (+11%)   182
utf-8       '\u8000'+'\u0100'*9999        164 (+10%)   181
utf-8     '\U00010000'*10000              152 (+11%)   168
utf-8       '\U00010000'+'A'*9999         313 (+0%)    313
utf-8       '\U00010000'+'\x80'*9999      161 (+11%)   179
utf-8       '\U00010000'+'\u0100'*9999    161 (+11%)   179
utf-8       '\U00010000'+'\u8000'*9999    160 (+11%)   177
History
Date User Action Args
2012-05-26 09:11:08serhiy.storchakasetrecipients: + serhiy.storchaka, loewis, jcea, ronaldoussoren, mark.dickinson, janssen, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, python-dev
2012-05-26 09:11:07serhiy.storchakasetmessageid: <1338023467.99.0.0909826635798.issue14923@psf.upfronthosting.co.za>
2012-05-26 09:11:07serhiy.storchakalinkissue14923 messages
2012-05-26 09:11:06serhiy.storchakacreate