classification
Title: [PATCH] remove bogus codepath from pprint._safe_repr
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: mvyskocil, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013-08-08 08:29 by mvyskocil, last changed 2015-05-16 18:42 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
pprint-remove-bogus-code.patch mvyskocil, 2013-08-08 08:29 proposed patch, which apply on current hg master (f871f8662509)
check.py mvyskocil, 2013-08-08 08:32 check a (in)sensitiveness of .isalnum()
pprint_safe_repr_scalars.patch serhiy.storchaka, 2013-08-14 19:57 review
Messages (11)
msg194652 - (view) Author: Michal Vyskocil (mvyskocil) Date: 2013-08-08 08:29
pprint._safe_repr for type str uses much slower codepath by default, which does not makes a sense in Python3 context. Instead of simply using repr, it check the existence of 'locale' in sys.modules and if found, it goes one-by-one-char call str.isalpha() on each and apply the quoting for non-alpha chars. This is extremely slow, but as locale is usually in sys.modules, it's used by default.

The point of such code was because in python2, str.isalpha() depends on locale, so for locale-aware Python builds, there was a different path needed. But this does not apply for Python3, where all strings are unicode, so .isaplha() is not locale sensitive anymore.
msg194653 - (view) Author: Michal Vyskocil (mvyskocil) Date: 2013-08-08 08:31
This is simple code checks if .isalnum is or is not locale sensitive and a small measurement of how much is the repr faster, compared to old codepath.

BTW: python3 test_pprint.py on patched version have succeeded

OK (expected failures=1)
msg195154 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-14 13:29
Ran some benchmark numbers:

$ ./python -m timeit -s "from pprint import saferepr; s='é\"x' * 1000" "saferepr(s)"
-> before patch: 555 usec per loop
-> after patch: 10.9 usec per loop

$ ./python -m timeit -s "from pprint import saferepr; s='xxx' * 1000" "saferepr(s)"
-> before patch: 290 usec per loop
-> after patch: 8.68 usec per loop
msg195159 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-14 13:48
We can just remove "if typ is str:" case at all. A general case works for strings.

But for performance (this will slowdown Antoine's tests by 15-20%) we should left it and may be even extend it to other builtin types:

builtin_type = (str, int, float, NoneType, list, tuple, dict, set, bytes): # or may be use a set?
...
    if typ in builtin_type:
        return repr(object), True, False
msg195208 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-14 19:57
Here is a patch which adds fast path for builtin scalar types.

Or we can just remove a special case for strings.
msg195237 - (view) Author: Michal Vyskocil (mvyskocil) Date: 2013-08-15 07:45
The fast scalars approach looks great!
msg197547 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-13 04:29
Antoine, what you say?
msg197559 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-09-13 08:35
Are there tests for all the "builtin scalars"?
msg198343 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-23 21:31
To made some sense in Python 3 context we should remove the check for the locale module and replace repr() by ascii().
msg243352 - (view) Author: Roundup Robot (python-dev) Date: 2015-05-16 18:39
New changeset c77a42c234d6 by Serhiy Storchaka in branch 'default':
Issue #18682: Optimized pprint functions for builtin scalar types.
https://hg.python.org/cpython/rev/c77a42c234d6
msg243354 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-16 18:42
> Are there tests for all the "builtin scalars"?

Yes, now there are.
History
Date User Action Args
2015-05-16 18:42:02serhiy.storchakasetstatus: open -> closed
versions: + Python 3.5, - Python 3.4
messages: + msg243354

assignee: serhiy.storchaka
resolution: fixed
stage: patch review -> resolved
2015-05-16 18:39:42python-devsetnosy: + python-dev
messages: + msg243352
2013-09-23 21:31:34serhiy.storchakasetmessages: + msg198343
2013-09-13 08:35:57pitrousetmessages: + msg197559
2013-09-13 04:29:40serhiy.storchakasetmessages: + msg197547
2013-08-15 07:45:47mvyskocilsetmessages: + msg195237
2013-08-14 19:57:47serhiy.storchakasetfiles: + pprint_safe_repr_scalars.patch

messages: + msg195208
2013-08-14 13:48:25serhiy.storchakasetmessages: + msg195159
2013-08-14 13:29:17pitrousetversions: + Python 3.4
nosy: + serhiy.storchaka, pitrou

messages: + msg195154

stage: patch review
2013-08-08 08:32:53mvyskocilsetfiles: + check.py
2013-08-08 08:31:58mvyskocilsetmessages: + msg194653
2013-08-08 08:29:26mvyskocilcreate