This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients ncoghlan, serhiy.storchaka, vstinner
Date 2016-01-09.08:41:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1452328901.15.0.65828251852.issue26057@psf.upfronthosting.co.za>
In-reply-to
Content
In Python 2 PyUnicode_FromObject() was used for coercing 8-bit strings to unicode by decoding them with the default encoding. But in Python 3 there is no such coercing. The effect of PyUnicode_FromObject() in Python 3 is ensuring that the argument is a string and convert an instance of str subtype to exact str. The latter often is just a waste of memory and time, since resulted string is used only for retrieving UTF-8 representation or raw data. 

Proposed patch makes following things:

1. Avoids unneeded copying of string's content.
2. Avoids raising some unneeded exceptions.
3. Gets rid of unneeded incref/decref.
4. Makes some error messages more correct or informative.
5. Converts runtime checks PyBytes_Check() for results of string encoding to asserts.

Example of performance gain:

Unpatched:
$ ./python -m timeit -s "a = 'a'*100; b = 'b'*1000" -- "a in b"
1000000 loops, best of 3: 0.404 usec per loop
$ ./python -m timeit -s "class S(str): pass" -s "a = S('a'*100); b = S('b'*1000)" -- "a in b"
1000000 loops, best of 3: 0.723 usec per loop

Patched:
$ ./python -m timeit -s "a = 'a'*100; b = 'b'*1000" -- "a in b"
1000000 loops, best of 3: 0.383 usec per loop
$ ./python -m timeit -s "class S(str): pass" -s "a = S('a'*100); b = S('b'*1000)" -- "a in b"
1000000 loops, best of 3: 0.387 usec per loop
History
Date User Action Args
2016-01-09 08:41:42serhiy.storchakasetrecipients: + serhiy.storchaka, ncoghlan, vstinner
2016-01-09 08:41:41serhiy.storchakasetmessageid: <1452328901.15.0.65828251852.issue26057@psf.upfronthosting.co.za>
2016-01-09 08:41:40serhiy.storchakalinkissue26057 messages
2016-01-09 08:41:40serhiy.storchakacreate