classification
Title: Avoid nonneeded use of PyUnicode_FromObject()
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: haypo, martin.panter, ncoghlan, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2016-01-09 08:41 by serhiy.storchaka, last changed 2016-04-14 09:31 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
no_unicode_copy.patch serhiy.storchaka, 2016-01-09 08:41 review
no_unicode_copy_2.patch serhiy.storchaka, 2016-04-10 10:51 review
Messages (8)
msg257806 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-09 08:41
In Python 2 PyUnicode_FromObject() was used for coercing 8-bit strings to unicode by decoding them with the default encoding. But in Python 3 there is no such coercing. The effect of PyUnicode_FromObject() in Python 3 is ensuring that the argument is a string and convert an instance of str subtype to exact str. The latter often is just a waste of memory and time, since resulted string is used only for retrieving UTF-8 representation or raw data. 

Proposed patch makes following things:

1. Avoids unneeded copying of string's content.
2. Avoids raising some unneeded exceptions.
3. Gets rid of unneeded incref/decref.
4. Makes some error messages more correct or informative.
5. Converts runtime checks PyBytes_Check() for results of string encoding to asserts.

Example of performance gain:

Unpatched:
$ ./python -m timeit -s "a = 'a'*100; b = 'b'*1000" -- "a in b"
1000000 loops, best of 3: 0.404 usec per loop
$ ./python -m timeit -s "class S(str): pass" -s "a = S('a'*100); b = S('b'*1000)" -- "a in b"
1000000 loops, best of 3: 0.723 usec per loop

Patched:
$ ./python -m timeit -s "a = 'a'*100; b = 'b'*1000" -- "a in b"
1000000 loops, best of 3: 0.383 usec per loop
$ ./python -m timeit -s "class S(str): pass" -s "a = S('a'*100); b = S('b'*1000)" -- "a in b"
1000000 loops, best of 3: 0.387 usec per loop
msg257807 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-09 09:02
See also issue15984 about correcting the documentation of PyUnicode_FromObject().
msg263125 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-10 06:11
Left some comments
msg263128 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-10 10:51
Updated patch addresses Martin's comments.
msg263326 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-13 11:33
Apart from one redundancy (see review), this looks good to me.
msg263332 - (view) Author: Roundup Robot (python-dev) Date: 2016-04-13 12:44
New changeset 3f3b3d4881f6 by Serhiy Storchaka in branch 'default':
Issue #26057: Got rid of nonneeded use of PyUnicode_FromObject().
https://hg.python.org/cpython/rev/3f3b3d4881f6
msg263333 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-13 12:45
Thank you for your review Martin.
msg263392 - (view) Author: Roundup Robot (python-dev) Date: 2016-04-14 09:31
New changeset 19dec08e54a8 by Serhiy Storchaka in branch 'default':
Issues #26716, #26057: Regenerate Argument Clinic code.
https://hg.python.org/cpython/rev/19dec08e54a8
History
Date User Action Args
2016-04-14 09:31:23python-devsetmessages: + msg263392
2016-04-13 12:45:58serhiy.storchakasetstatus: open -> closed
messages: + msg263333

assignee: serhiy.storchaka
resolution: fixed
stage: patch review -> resolved
2016-04-13 12:44:38python-devsetnosy: + python-dev
messages: + msg263332
2016-04-13 11:33:50martin.pantersetmessages: + msg263326
2016-04-10 10:51:23serhiy.storchakasetfiles: + no_unicode_copy_2.patch

messages: + msg263128
2016-04-10 06:11:16martin.pantersetnosy: + martin.panter
messages: + msg263125
2016-01-09 09:02:23serhiy.storchakasetmessages: + msg257807
2016-01-09 08:41:41serhiy.storchakacreate