Message257806
In Python 2 PyUnicode_FromObject() was used for coercing 8-bit strings to unicode by decoding them with the default encoding. But in Python 3 there is no such coercing. The effect of PyUnicode_FromObject() in Python 3 is ensuring that the argument is a string and convert an instance of str subtype to exact str. The latter often is just a waste of memory and time, since resulted string is used only for retrieving UTF-8 representation or raw data.
Proposed patch makes following things:
1. Avoids unneeded copying of string's content.
2. Avoids raising some unneeded exceptions.
3. Gets rid of unneeded incref/decref.
4. Makes some error messages more correct or informative.
5. Converts runtime checks PyBytes_Check() for results of string encoding to asserts.
Example of performance gain:
Unpatched:
$ ./python -m timeit -s "a = 'a'*100; b = 'b'*1000" -- "a in b"
1000000 loops, best of 3: 0.404 usec per loop
$ ./python -m timeit -s "class S(str): pass" -s "a = S('a'*100); b = S('b'*1000)" -- "a in b"
1000000 loops, best of 3: 0.723 usec per loop
Patched:
$ ./python -m timeit -s "a = 'a'*100; b = 'b'*1000" -- "a in b"
1000000 loops, best of 3: 0.383 usec per loop
$ ./python -m timeit -s "class S(str): pass" -s "a = S('a'*100); b = S('b'*1000)" -- "a in b"
1000000 loops, best of 3: 0.387 usec per loop |
|
Date |
User |
Action |
Args |
2016-01-09 08:41:42 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka, ncoghlan, vstinner |
2016-01-09 08:41:41 | serhiy.storchaka | set | messageid: <1452328901.15.0.65828251852.issue26057@psf.upfronthosting.co.za> |
2016-01-09 08:41:40 | serhiy.storchaka | link | issue26057 messages |
2016-01-09 08:41:40 | serhiy.storchaka | create | |
|