classification
Title: html.escape 10x slower than cgi.escape
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: Teh Matt, akuchling, ezio.melotti, flox, grahamd, jwilk, orsenthil, python-dev
Priority: normal Keywords: patch

Created on 2013-05-20 08:21 by flox, last changed 2013-07-07 09:12 by ezio.melotti. This issue is now closed.

Files
File name Uploaded Description Edit
htmlescape.patch Teh Matt, 2013-05-20 23:05 Speed up html.escape() review
Messages (8)
msg189641 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2013-05-20 08:21
I noticed the convenient ``html.escape`` in Python 3.2 and ``cgi.escape`` is marked as deprecated.


However, the former is an order of magnitude slower than the latter.

$ python3 --version
Python 3.3.2


With html.escape:

$ python3 -m timeit -s "from html import escape as html; from cgi import escape; s = repr(copyright)" "h = html(s)"
10000 loops, best of 3: 48.7 usec per loop
$ python3 -m timeit -s "from html import escape as html; from cgi import escape; s = repr(copyright) * 19" "h = html(s)"
1000 loops, best of 3: 898 usec per loop

With cgi.escape:

$ python3 -m timeit -s "from html import escape as html; from cgi import escape; s = repr(copyright)" "h = escape(s)"
100000 loops, best of 3: 7.42 usec per loop
$ python3 -m timeit -s "from html import escape as html; from cgi import escape; s = repr(copyright) * 19" "h = escape(s)"
10000 loops, best of 3: 21.5 usec per loop


Since this kind of function is called frequently in template engines, it makes a difference.
Of course C replacements are available on PyPI: MarkupSafe or Webext

But it would be nice to restore the performance of cgi.escape with a pragmatic `.replace(` approach.
msg189643 - (view) Author: Graham Dumpleton (grahamd) Date: 2013-05-20 08:53
Importing the cgi module the first time even in Python 2.X was always very expensive. I would suggest you redo the test using timing done inside of the script after modules have been imported so as to properly separate module import time in both cases from execution time of the specific function.
msg189644 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2013-05-20 09:06
> I would suggest you redo the test using timing done inside of the script after modules have been imported.

The -s switch takes care of this.
msg189647 - (view) Author: Graham Dumpleton (grahamd) Date: 2013-05-20 10:14
Whoops. Missed the quoting.
msg189711 - (view) Author: Matt Bryant (Teh Matt) * Date: 2013-05-20 23:05
I did a few more tests and am seeing the same speed differences Florent noticed.
It seems reasonable to use .replace() instead, as it does the same thing significantly faster.
I've attached a patch doing just this.
msg190267 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2013-05-29 02:30
Matt's patch looks good to me.  It removes two module-level dicts, but they're marked as internal, so that's OK.  There's already a test case that exercises html.escape(), so I don't think any additional tests are needed.
msg192527 - (view) Author: Roundup Robot (python-dev) Date: 2013-07-07 09:11
New changeset db5f2b74e369 by Ezio Melotti in branch 'default':
#18020: improve html.escape speed by an order of magnitude.  Patch by Matt Bryant.
http://hg.python.org/cpython/rev/db5f2b74e369
msg192528 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-07-07 09:12
Fixed, thanks for the report and the patch!
History
Date User Action Args
2013-07-07 09:12:39ezio.melottisetstatus: open -> closed
versions: + Python 3.4, - Python 3.2, Python 3.3
messages: + msg192528

resolution: fixed
stage: patch review -> resolved
2013-07-07 09:11:36python-devsetnosy: + python-dev
messages: + msg192527
2013-06-01 13:33:53ezio.melottisetassignee: ezio.melotti
stage: patch review
2013-05-29 02:30:26akuchlingsetnosy: + akuchling
messages: + msg190267
2013-05-25 15:42:51jwilksetnosy: + jwilk
2013-05-20 23:05:30Teh Mattsetfiles: + htmlescape.patch

nosy: + Teh Matt
messages: + msg189711

keywords: + patch
2013-05-20 10:14:05grahamdsetmessages: + msg189647
2013-05-20 09:06:38floxsetmessages: + msg189644
2013-05-20 08:53:52grahamdsetnosy: + grahamd
messages: + msg189643
2013-05-20 08:21:38floxcreate