Message132085
I did a few more tests and using a re.sub seems indeed slower (the implementation is just 4 lines though, and it's more readable):
wolf@hp:~/dev/py/3.1$ ./python -m timeit -s 'import re,string; escape_pattern = re.compile("([^\x00a-zA-Z0-9])")' 'escape_pattern.sub(r"\\\1", string.printable).replace("\x00", "\\000")'
1000 loops, best of 3: 219 usec per loop
wolf@hp:~/dev/py/3.1$ ./python -m timeit -s 'import re,string' 're.escape(string.printable)'
10000 loops, best of 3: 59.3 usec per loop
wolf@hp:~/dev/py/3.1$ ./python -c 'import re,string; escape_pattern = re.compile("([^\x00a-zA-Z0-9])"); print(escape_pattern.sub(r"\\\1", string.printable).replace("\x00", "\\000") == re.escape(string.printable))'
True
wolf@hp:~/dev/py/3.1$ ./python -m timeit -s 'import re,string; escape_pattern = re.compile(b"([^\x00a-zA-Z0-9])"); s = string.printable.encode("ascii")' 'escape_pattern.sub(br"\\\1", s).replace(b"\x00", b"\\000")'
1000 loops, best of 3: 231 usec per loop
wolf@hp:~/dev/py/3.1$ ./python -m timeit -s 'import re,string; s = string.printable.encode("ascii")' 're.escape(s)'
10000 loops, best of 3: 73.2 usec per loop
wolf@hp:~/dev/py/3.1$ ./python -c 'import re,string; escape_pattern = re.compile(b"([^\x00a-zA-Z0-9])"); s = string.printable.encode("ascii"); print(escape_pattern.sub(br"\\\1", s).replace(b"\x00", b"\\000") == re.escape(s))'
True
The .replace() doesn't seem to affect the affect the speed in any significant way.
I also did a few more tests:
1) using enumerate();
2) like 1) but also moving \x00 in the set of alnum chars, removing the "if c == '\000'" from the loop and using .replace("\x00", "\\000") on the joined string;
3) like 2) but also moving the loop in a genexp inside the join();
1) is the fastest (10-15% faster than the original), 2) is pretty much the same speed of 1), and 3) is slower, so I just changed re.escape to use enumerate() and refactored its tests in 2.7/3.1/3.2/3.3. |
|
Date |
User |
Action |
Args |
2011-03-25 12:57:27 | ezio.melotti | set | recipients:
+ ezio.melotti, georg.brandl, amaury.forgeotdarc, belopolsky, foom, pitrou, rsc, timehorse, benjamin.peterson, zanella, donlorenzo, bjourne, mortenlj, mrabarnett, SilentGhost, swamiyeswanth, python-dev |
2011-03-25 12:57:27 | ezio.melotti | set | messageid: <1301057847.15.0.983637864081.issue2650@psf.upfronthosting.co.za> |
2011-03-25 12:57:26 | ezio.melotti | link | issue2650 messages |
2011-03-25 12:57:25 | ezio.melotti | create | |
|