Author serhiy.storchaka
Recipients Arfrever, ezio.melotti, janssen, jcea, loewis, mark.dickinson, ned.deily, pitrou, python-dev, ronaldoussoren, serhiy.storchaka, vstinner
Date 2012-05-27.15:49:47
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1338133789.47.0.308833193181.issue14923@psf.upfronthosting.co.za>
In-reply-to
Content
Yes, this is an implementation-dependent behavior (and on the supported platforms it is implemented well in a certain way).

However, if the continuation byte check to do the simplest way ((ch) >= 0x80 && (ch) < 0xC0), this has the same effect (speed up to +45%) on AMD Athlon.

                                          vanilla      patched

utf-8     'A'*10000                       2061 (-2%)   2018
utf-8     '\x80'*10000                    383 (+9%)    416
utf-8       '\x80'+'A'*9999               1273 (+3%)   1315
utf-8     '\u0100'*10000                  382 (+46%)   558
utf-8       '\u0100'+'A'*9999             1239 (+0%)   1245
utf-8       '\u0100'+'\x80'*9999          383 (+46%)   558
utf-8     '\u8000'*10000                  434 (-6%)    408
utf-8       '\u8000'+'A'*9999             1245 (+0%)   1245
utf-8       '\u8000'+'\x80'*9999          382 (+46%)   556
utf-8       '\u8000'+'\u0100'*9999        383 (+45%)   556
utf-8     '\U00010000'*10000              358 (+0%)    359
utf-8       '\U00010000'+'A'*9999         1171 (-0%)   1170
utf-8       '\U00010000'+'\x80'*9999      381 (+30%)   495
utf-8       '\U00010000'+'\u0100'*9999    381 (+30%)   495
utf-8       '\U00010000'+'\u8000'*9999    404 (-5%)    385

On Intel Atom the results did not change or become a little better.

                                          vanilla      patched

utf-8     'A'*10000                       623 (+3%)    642
utf-8     '\x80'*10000                    145 (+9%)    158
utf-8       '\x80'+'A'*9999               354 (+4%)    367
utf-8     '\u0100'*10000                  164 (+0%)    164
utf-8       '\u0100'+'A'*9999             343 (+2%)    351
utf-8       '\u0100'+'\x80'*9999          164 (+1%)    165
utf-8     '\u8000'*10000                  175 (-2%)    171
utf-8       '\u8000'+'A'*9999             349 (+3%)    359
utf-8       '\u8000'+'\x80'*9999          164 (+0%)    164
utf-8       '\u8000'+'\u0100'*9999        164 (+0%)    164
utf-8     '\U00010000'*10000              152 (-1%)    150
utf-8       '\U00010000'+'A'*9999         313 (+2%)    319
utf-8       '\U00010000'+'\x80'*9999      161 (+1%)    162
utf-8       '\U00010000'+'\u0100'*9999    161 (+1%)    162
utf-8       '\U00010000'+'\u8000'*9999    160 (-2%)    156
History
Date User Action Args
2012-05-27 15:49:49serhiy.storchakasetrecipients: + serhiy.storchaka, loewis, jcea, ronaldoussoren, mark.dickinson, janssen, pitrou, vstinner, ned.deily, ezio.melotti, Arfrever, python-dev
2012-05-27 15:49:49serhiy.storchakasetmessageid: <1338133789.47.0.308833193181.issue14923@psf.upfronthosting.co.za>
2012-05-27 15:49:48serhiy.storchakalinkissue14923 messages
2012-05-27 15:49:48serhiy.storchakacreate