Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html.escape 10x slower than cgi.escape #62220

Closed
florentx mannequin opened this issue May 20, 2013 · 8 comments
Closed

html.escape 10x slower than cgi.escape #62220

florentx mannequin opened this issue May 20, 2013 · 8 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@florentx
Copy link
Mannequin

florentx mannequin commented May 20, 2013

BPO 18020
Nosy @akuchling, @orsenthil, @jwilk, @ezio-melotti, @florentx
Files
  • htmlescape.patch: Speed up html.escape()
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ezio-melotti'
    closed_at = <Date 2013-07-07.09:12:39.800>
    created_at = <Date 2013-05-20.08:21:38.319>
    labels = ['library', 'performance']
    title = 'html.escape 10x slower than cgi.escape'
    updated_at = <Date 2013-07-07.09:12:39.797>
    user = 'https://github.com/florentx'

    bugs.python.org fields:

    activity = <Date 2013-07-07.09:12:39.797>
    actor = 'ezio.melotti'
    assignee = 'ezio.melotti'
    closed = True
    closed_date = <Date 2013-07-07.09:12:39.800>
    closer = 'ezio.melotti'
    components = ['Library (Lib)']
    creation = <Date 2013-05-20.08:21:38.319>
    creator = 'flox'
    dependencies = []
    files = ['30325']
    hgrepos = []
    issue_num = 18020
    keywords = ['patch']
    message_count = 8.0
    messages = ['189641', '189643', '189644', '189647', '189711', '190267', '192527', '192528']
    nosy_count = 8.0
    nosy_names = ['akuchling', 'orsenthil', 'jwilk', 'ezio.melotti', 'grahamd', 'flox', 'python-dev', 'Teh Matt']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue18020'
    versions = ['Python 3.4']

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented May 20, 2013

    I noticed the convenient html.escape in Python 3.2 and cgi.escape is marked as deprecated.

    However, the former is an order of magnitude slower than the latter.

    $ python3 --version
    Python 3.3.2

    With html.escape:

    $ python3 -m timeit -s "from html import escape as html; from cgi import escape; s = repr(copyright)" "h = html(s)"
    10000 loops, best of 3: 48.7 usec per loop
    $ python3 -m timeit -s "from html import escape as html; from cgi import escape; s = repr(copyright) * 19" "h = html(s)"
    1000 loops, best of 3: 898 usec per loop

    With cgi.escape:

    $ python3 -m timeit -s "from html import escape as html; from cgi import escape; s = repr(copyright)" "h = escape(s)"
    100000 loops, best of 3: 7.42 usec per loop
    $ python3 -m timeit -s "from html import escape as html; from cgi import escape; s = repr(copyright) * 19" "h = escape(s)"
    10000 loops, best of 3: 21.5 usec per loop

    Since this kind of function is called frequently in template engines, it makes a difference.
    Of course C replacements are available on PyPI: MarkupSafe or Webext

    But it would be nice to restore the performance of cgi.escape with a pragmatic .replace( approach.

    @florentx florentx mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels May 20, 2013
    @grahamd
    Copy link
    Mannequin

    grahamd mannequin commented May 20, 2013

    Importing the cgi module the first time even in Python 2.X was always very expensive. I would suggest you redo the test using timing done inside of the script after modules have been imported so as to properly separate module import time in both cases from execution time of the specific function.

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented May 20, 2013

    I would suggest you redo the test using timing done inside of the script after modules have been imported.

    The -s switch takes care of this.

    @grahamd
    Copy link
    Mannequin

    grahamd mannequin commented May 20, 2013

    Whoops. Missed the quoting.

    @TehMatt
    Copy link
    Mannequin

    TehMatt mannequin commented May 20, 2013

    I did a few more tests and am seeing the same speed differences Florent noticed.
    It seems reasonable to use .replace() instead, as it does the same thing significantly faster.
    I've attached a patch doing just this.

    @akuchling
    Copy link
    Member

    Matt's patch looks good to me. It removes two module-level dicts, but they're marked as internal, so that's OK. There's already a test case that exercises html.escape(), so I don't think any additional tests are needed.

    @ezio-melotti ezio-melotti self-assigned this Jun 1, 2013
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 7, 2013

    New changeset db5f2b74e369 by Ezio Melotti in branch 'default':
    bpo-18020: improve html.escape speed by an order of magnitude. Patch by Matt Bryant.
    http://hg.python.org/cpython/rev/db5f2b74e369

    @ezio-melotti
    Copy link
    Member

    Fixed, thanks for the report and the patch!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants