This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author xiang.zhang
Recipients serhiy.storchaka, vstinner, xiang.zhang
Date 2016-10-25.16:16:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1477412206.88.0.583380682603.issue28531@psf.upfronthosting.co.za>
In-reply-to
Content
Currently utf7 encoder uses an aggressive memory allocation strategy: use the worst case 8. We can tighten the worst case.

For 1 byte and 2 byte unicodes, the worst case could be 3*n + 2. For 4 byte unicodes, the worst case could be 6*n + 2.

There are 2 cases. First, all characters needs to be encoded, the result length should be upper_round(2.67*n) + 2 <= 3*n + 2. Second, encode and not encode characters appear one by one. For even length, it's 3n < 3n + 2. For odd length, it's exactly 3n + 2.

This won't benefit much when the string is short. But when the string is long, it speeds up.

Without patch:

[bin]$ ./python3 -m perf timeit -s 's = "abc"*10' 's.encode("utf7")'
....................
Median +- std dev: 2.79 us +- 0.09 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*100' 's.encode("utf7")'
....................
Median +- std dev: 4.55 us +- 0.13 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*1000' 's.encode("utf7")'
....................
Median +- std dev: 14.0 us +- 0.4 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*10000' 's.encode("utf7")'
....................
Median +- std dev: 178 us +- 1 us

With patch:

[bin]$ ./python3 -m perf timeit -s 's = "abc"*10' 's.encode("utf7")'
....................
Median +- std dev: 2.87 us +- 0.09 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*100' 's.encode("utf7")'
....................
Median +- std dev: 4.50 us +- 0.23 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*1000' 's.encode("utf7")'
....................
Median +- std dev: 13.3 us +- 0.4 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*10000' 's.encode("utf7")'
....................
Median +- std dev: 102 us +- 1 us

The patch also removes a check, base64bits can only be not 0 when inShift is not 0.
History
Date User Action Args
2016-10-25 16:16:46xiang.zhangsetrecipients: + xiang.zhang, vstinner, serhiy.storchaka
2016-10-25 16:16:46xiang.zhangsetmessageid: <1477412206.88.0.583380682603.issue28531@psf.upfronthosting.co.za>
2016-10-25 16:16:46xiang.zhanglinkissue28531 messages
2016-10-25 16:16:46xiang.zhangcreate