This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients ezio.melotti, gvanrossum, kennyluck, lemburg, loewis, serhiy.storchaka, tchrist, vstinner
Date 2013-09-02.15:34:03
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1378136048.08.0.601314131306.issue12892@psf.upfronthosting.co.za>
In-reply-to
Content
Here is a patch which combines both Kang-Hao's patches, synchronized with tip, fixed and optimized.

Unfortunately even optimized this patch slowdown encoding/decoding some data. Here are some benchmark results (benchmarking tools are here: https://bitbucket.org/storchaka/cpython-stuff/src/default/bench).

3.3          3.4          3.4
             unpatched    patched

969 (+12%)   978 (+11%)   1087   encode  utf-16le  'A'*10000
2453 (-62%)  2356 (-61%)  923    encode  utf-16le  '\u0100'*10000
222 (+12%)   224 (+11%)   249    encode  utf-16le    '\U00010000'+'\u0100'*9999
784 (+6%)    791 (+5%)    831    encode  utf-16be  'A'*10000
750 (-4%)    752 (-4%)    719    encode  utf-16be  '\u0100'*10000
233 (+2%)    235 (+1%)    238    encode  utf-16be    '\U00010000'+'\u0100'*9999

531 (-7%)    545 (-9%)    494    encode  utf-32le  'A'*10000
383 (-38%)   385 (-38%)   239    encode  utf-32le  '\u0100'*10000
324 (-24%)   325 (-25%)   245    encode  utf-32le    '\U00010000'+'\u0100'*9999
544 (-10%)   545 (-10%)   492    encode  utf-32be  'A'*10000
384 (-38%)   384 (-38%)   239    encode  utf-32be  '\u0100'*10000
325 (-25%)   325 (-25%)   245    encode  utf-32be    '\U00010000'+'\u0100'*9999

682 (+5%)    679 (+5%)    715    decode  utf-16le  'A'*10000
607 (+1%)    610 (+1%)    614    decode  utf-16le  '\u0100'*10000
550 (+1%)    554 (+0%)    556    decode  utf-16le    '\U00010000'+'\u0100'*9999
609 (+0%)    600 (+2%)    610    decode  utf-16be  'A'*10000
464 (+1%)    466 (+1%)    470    decode  utf-16be  '\u0100'*10000
432 (+1%)    431 (+1%)    435    decode  utf-16be    '\U00010000'+'\u0100'*9999

103 (+272%)  374 (+2%)    383    decode  utf-32le  'A'*10000
91 (+264%)   390 (-15%)   331    decode  utf-32le  '\u0100'*10000
90 (+257%)   393 (-18%)   321    decode  utf-32le    '\U00010000'+'\u0100'*9999
103 (+269%)  393 (-3%)    380    decode  utf-32be  'A'*10000
91 (+263%)   406 (-19%)   330    decode  utf-32be  '\u0100'*10000
90 (+257%)   393 (-18%)   321    decode  utf-32be    '\U00010000'+'\u0100'*9999

Performance of utf-16 decoding is not changed. utf-16 encoder is 2.5 times slowed for UCS2 data (it was just memcpy) but still 3+ times faster than 2.7 and 3.2 (issue15026). Due to additional optimization it now even slightly faster for some other data. There is a patch for speed up UTF-32 encoding (issue15027), it should help to compensate it's performance degradation. UTF-32 decoder already 3-4 times faster than in 3.3 (issue14625).

I don't see performance objection against this patch.
History
Date User Action Args
2013-09-02 15:34:08serhiy.storchakasetrecipients: + serhiy.storchaka, lemburg, gvanrossum, loewis, vstinner, ezio.melotti, tchrist, kennyluck
2013-09-02 15:34:08serhiy.storchakasetmessageid: <1378136048.08.0.601314131306.issue12892@psf.upfronthosting.co.za>
2013-09-02 15:34:08serhiy.storchakalinkissue12892 messages
2013-09-02 15:34:06serhiy.storchakacreate