Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH] remove bogus codepath from pprint._safe_repr #62882

Closed
mvyskocil mannequin opened this issue Aug 8, 2013 · 11 comments
Closed

[PATCH] remove bogus codepath from pprint._safe_repr #62882

mvyskocil mannequin opened this issue Aug 8, 2013 · 11 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@mvyskocil
Copy link
Mannequin

mvyskocil mannequin commented Aug 8, 2013

BPO 18682
Nosy @pitrou, @serhiy-storchaka
Files
  • pprint-remove-bogus-code.patch: proposed patch, which apply on current hg master (f871f8662509)
  • check.py: check a (in)sensitiveness of .isalnum()
  • pprint_safe_repr_scalars.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-05-16.18:42:02.087>
    created_at = <Date 2013-08-08.08:29:26.326>
    labels = ['library', 'performance']
    title = '[PATCH] remove bogus codepath from pprint._safe_repr'
    updated_at = <Date 2015-05-16.18:42:02.084>
    user = 'https://bugs.python.org/mvyskocil'

    bugs.python.org fields:

    activity = <Date 2015-05-16.18:42:02.084>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-05-16.18:42:02.087>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2013-08-08.08:29:26.326>
    creator = 'mvyskocil'
    dependencies = []
    files = ['31193', '31194', '31292']
    hgrepos = []
    issue_num = 18682
    keywords = ['patch']
    message_count = 11.0
    messages = ['194652', '194653', '195154', '195159', '195208', '195237', '197547', '197559', '198343', '243352', '243354']
    nosy_count = 4.0
    nosy_names = ['pitrou', 'mvyskocil', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue18682'
    versions = ['Python 3.5']

    @mvyskocil
    Copy link
    Mannequin Author

    mvyskocil mannequin commented Aug 8, 2013

    pprint._safe_repr for type str uses much slower codepath by default, which does not makes a sense in Python3 context. Instead of simply using repr, it check the existence of 'locale' in sys.modules and if found, it goes one-by-one-char call str.isalpha() on each and apply the quoting for non-alpha chars. This is extremely slow, but as locale is usually in sys.modules, it's used by default.

    The point of such code was because in python2, str.isalpha() depends on locale, so for locale-aware Python builds, there was a different path needed. But this does not apply for Python3, where all strings are unicode, so .isaplha() is not locale sensitive anymore.

    @mvyskocil mvyskocil mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Aug 8, 2013
    @mvyskocil
    Copy link
    Mannequin Author

    mvyskocil mannequin commented Aug 8, 2013

    This is simple code checks if .isalnum is or is not locale sensitive and a small measurement of how much is the repr faster, compared to old codepath.

    BTW: python3 test_pprint.py on patched version have succeeded

    OK (expected failures=1)

    @pitrou
    Copy link
    Member

    pitrou commented Aug 14, 2013

    Ran some benchmark numbers:

    $ ./python -m timeit -s "from pprint import saferepr; s='é\"x' * 1000" "saferepr(s)"
    -> before patch: 555 usec per loop
    -> after patch: 10.9 usec per loop
    
    $ ./python -m timeit -s "from pprint import saferepr; s='xxx' * 1000" "saferepr(s)"
    -> before patch: 290 usec per loop
    -> after patch: 8.68 usec per loop

    @serhiy-storchaka
    Copy link
    Member

    We can just remove "if typ is str:" case at all. A general case works for strings.

    But for performance (this will slowdown Antoine's tests by 15-20%) we should left it and may be even extend it to other builtin types:

    builtin_type = (str, int, float, NoneType, list, tuple, dict, set, bytes): # or may be use a set?
    ...
        if typ in builtin_type:
            return repr(object), True, False

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch which adds fast path for builtin scalar types.

    Or we can just remove a special case for strings.

    @mvyskocil
    Copy link
    Mannequin Author

    mvyskocil mannequin commented Aug 15, 2013

    The fast scalars approach looks great!

    @serhiy-storchaka
    Copy link
    Member

    Antoine, what you say?

    @pitrou
    Copy link
    Member

    pitrou commented Sep 13, 2013

    Are there tests for all the "builtin scalars"?

    @serhiy-storchaka
    Copy link
    Member

    To made some sense in Python 3 context we should remove the check for the locale module and replace repr() by ascii().

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 16, 2015

    New changeset c77a42c234d6 by Serhiy Storchaka in branch 'default':
    Issue bpo-18682: Optimized pprint functions for builtin scalar types.
    https://hg.python.org/cpython/rev/c77a42c234d6

    @serhiy-storchaka
    Copy link
    Member

    Are there tests for all the "builtin scalars"?

    Yes, now there are.

    @serhiy-storchaka serhiy-storchaka self-assigned this May 16, 2015
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants