This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients SilentGhost, amaury.forgeotdarc, belopolsky, benjamin.peterson, bjourne, donlorenzo, ezio.melotti, foom, georg.brandl, mortenlj, mrabarnett, pitrou, python-dev, rsc, swamiyeswanth, timehorse, zanella
Date 2011-03-25.12:57:25
SpamBayes Score 2.56639e-08
Marked as misclassified No
Message-id <1301057847.15.0.983637864081.issue2650@psf.upfronthosting.co.za>
In-reply-to
Content
I did a few more tests and using a re.sub seems indeed slower (the implementation is just 4 lines though, and it's more readable):

wolf@hp:~/dev/py/3.1$ ./python -m timeit -s 'import re,string; escape_pattern = re.compile("([^\x00a-zA-Z0-9])")' 'escape_pattern.sub(r"\\\1", string.printable).replace("\x00", "\\000")'
1000 loops, best of 3: 219 usec per loop
wolf@hp:~/dev/py/3.1$ ./python -m timeit -s 'import re,string' 're.escape(string.printable)'
10000 loops, best of 3: 59.3 usec per loop
wolf@hp:~/dev/py/3.1$ ./python -c 'import re,string; escape_pattern = re.compile("([^\x00a-zA-Z0-9])"); print(escape_pattern.sub(r"\\\1", string.printable).replace("\x00", "\\000") == re.escape(string.printable))'
True

wolf@hp:~/dev/py/3.1$ ./python -m timeit -s 'import re,string; escape_pattern = re.compile(b"([^\x00a-zA-Z0-9])"); s = string.printable.encode("ascii")' 'escape_pattern.sub(br"\\\1", s).replace(b"\x00", b"\\000")'
1000 loops, best of 3: 231 usec per loop
wolf@hp:~/dev/py/3.1$ ./python -m timeit -s 'import re,string; s = string.printable.encode("ascii")' 're.escape(s)'
10000 loops, best of 3: 73.2 usec per loop
wolf@hp:~/dev/py/3.1$ ./python -c 'import re,string; escape_pattern = re.compile(b"([^\x00a-zA-Z0-9])"); s = string.printable.encode("ascii"); print(escape_pattern.sub(br"\\\1", s).replace(b"\x00", b"\\000") == re.escape(s))'
True

The .replace() doesn't seem to affect the affect the speed in any significant way.
I also did a few more tests:
 1) using enumerate();
 2) like 1) but also moving \x00 in the set of alnum chars, removing the "if c == '\000'" from the loop and using .replace("\x00", "\\000") on the joined string;
 3) like 2) but also moving the loop in a genexp inside the join();

1) is the fastest (10-15% faster than the original), 2) is pretty much the same speed of 1), and 3) is slower, so I just changed re.escape to use enumerate() and refactored its tests in 2.7/3.1/3.2/3.3.
History
Date User Action Args
2011-03-25 12:57:27ezio.melottisetrecipients: + ezio.melotti, georg.brandl, amaury.forgeotdarc, belopolsky, foom, pitrou, rsc, timehorse, benjamin.peterson, zanella, donlorenzo, bjourne, mortenlj, mrabarnett, SilentGhost, swamiyeswanth, python-dev
2011-03-25 12:57:27ezio.melottisetmessageid: <1301057847.15.0.983637864081.issue2650@psf.upfronthosting.co.za>
2011-03-25 12:57:26ezio.melottilinkissue2650 messages
2011-03-25 12:57:25ezio.melotticreate