This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients loewis, python-dev, serhiy.storchaka, vstinner
Date 2012-11-07.22:53:52
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CAMpsgwYEtwOcJshPHNyYM6ypxAug74PTFAqcsgfiAt-djSpcFQ@mail.gmail.com>
In-reply-to <3Xx6hY4NfQzMP7@mail.python.org>
Content
Oh, I forgot my benchmark results.

decodebench.py result results on Linux 32 bits:
(Linux-3.2.0-32-generic-pae-i686-with-debian-wheezy-sid)

$ ./python bench-diff.py original writer
ascii     'A'*10000                       4109 (-3%)    3974

latin1    'A'*10000                       3851 (-5%)    3644
latin1    '\x80'*10000                    14832 (-3%)   14430

utf-8     'A'*10000                       3747 (-4%)    3608
utf-8     '\x80'*10000                    976 (-2%)     961
utf-8     '\u0100'*10000                  974 (-2%)     959
utf-8     '\u8000'*10000                  804 (-14%)    694
utf-8     '\U00010000'*10000              666 (-5%)     635

utf-16le  'A'*10000                       4154 (-1%)    4117
utf-16le  '\x80'*10000                    4055 (-2%)    3988
utf-16le  '\u0100'*10000                  4047 (-2%)    3974
utf-16le  '\u8000'*10000                  917 (-1%)     912
utf-16le  '\U00010000'*10000              872 (-0%)     870

utf-16be  'A'*10000                       3218 (-1%)    3185
utf-16be  '\x80'*10000                    3163 (-2%)    3114
utf-16be  '\u0100'*10000                  2591 (-1%)    2556
utf-16be  '\u8000'*10000                  979 (-1%)     974
utf-16be  '\U00010000'*10000              928 (-0%)     925

utf-32le  'A'*10000                       1681 (+12%)   1885
utf-32le  '\x80'*10000                    1697 (+10%)   1865
utf-32le  '\u0100'*10000                  2224 (+1%)    2254
utf-32le  '\u8000'*10000                  2224 (+2%)    2269
utf-32le  '\U00010000'*10000              2234 (+1%)    2260

utf-32be  'A'*10000                       1685 (+11%)   1868
utf-32be  '\x80'*10000                    1684 (+10%)   1860
utf-32be  '\u0100'*10000                  2223 (+1%)    2253
utf-32be  '\u8000'*10000                  2222 (+1%)    2255
utf-32be  '\U00010000'*10000              2243 (+1%)    2257

decodebench.py result results on Linux 64 bits:
(Linux-3.4.9-2.fc16.x86_64-x86_64-with-fedora-16-Verne)

ascii     'A'*10000                       10043 (+1%)   10144

latin1    'A'*10000                       8351 (-1%)    8258
latin1    '\x80'*10000                    19184 (+2%)   19560

utf-8     'A'*10000                       8083 (+5%)    8461
utf-8     '\x80'*10000                    982 (+1%)     993
utf-8     '\u0100'*10000                  984 (+1%)     992
utf-8     '\u8000'*10000                  806 (+31%)    1053
utf-8     '\U00010000'*10000              639 (+12%)    718

utf-16le  'A'*10000                       5547 (-2%)    5422
utf-16le  '\x80'*10000                    5205 (+1%)    5271
utf-16le  '\u0100'*10000                  4900 (-4%)    4695
utf-16le  '\u8000'*10000                  1062 (+9%)    1154
utf-16le  '\U00010000'*10000              1040 (+4%)    1078

utf-16be  'A'*10000                       5416 (-5%)    5157
utf-16be  '\x80'*10000                    5077 (-1%)    5011
utf-16be  '\u0100'*10000                  4261 (-1%)    4218
utf-16be  '\u8000'*10000                  1146 (+0%)    1147
utf-16be  '\U00010000'*10000              1125 (-1%)    1119

utf-32le  'A'*10000                       1743 (+8%)    1880
utf-32le  '\x80'*10000                    1751 (+5%)    1842
utf-32le  '\u0100'*10000                  2114 (+29%)   2721
utf-32le  '\u8000'*10000                  2120 (+28%)   2718
utf-32le  '\U00010000'*10000              2065 (+30%)   2690

utf-32be  'A'*10000                       1761 (+6%)    1860
utf-32be  '\x80'*10000                    1749 (+6%)    1856
utf-32be  '\u0100'*10000                  2101 (+29%)   2715
utf-32be  '\u8000'*10000                  2083 (+30%)   2715
utf-32be  '\U00010000'*10000              2058 (+31%)   2689

Most significant changes:
 * -14% to decode '\u8000'*10000 from UTF-8 on Linux 32 bits
 * +31% to decode '\u8000'*10000 from UTF-8 on Linux 32 bits
 * +28% to +31% to decode UCS-2 and UCS-4 characters from UTF-8 on Linux 32 bits

@Serhiy Storchaka: If you feel able to tune _PyUnicodeWriter to
improve its performance, please open a new issue.

I consider the performance changes acceptable and I don't plan to work
on this topic.
History
Date User Action Args
2012-11-07 22:53:54vstinnersetrecipients: + vstinner, loewis, python-dev, serhiy.storchaka
2012-11-07 22:53:54vstinnerlinkissue16311 messages
2012-11-07 22:53:52vstinnercreate