Message130821
I think these are two different questions:
1. What to escape
2. What to do about poor performance of the re.escape when re.sub is used
In my opinion, there isn't any justifiable reason to escape non-meta characters: it doesn't affect matching; escaped strings are typically just re-used in regex.
I would favour simpler and cleaner code with re.sub. I don't think that re.quote could be a performance bottleneck in any application. I did some profiling with python3.2 and it seems that the reason for this poor performance is many abstraction layers when using re.sub. However, we need to bear in mind that we're only talking about 40 usec difference for a 100-char string (string.printable): I'd think that strings being escaped are typically shorter.
As a compromise, I tested this code:
_mp = {ord(i): '\\' + i for i in '][.^$*+?{}\\|()'}
def escape(pattern):
if isinstance(pattern, str):
return pattern.translate(_mp)
return sub(br'([][.^$*+?{}\\|()])', br'\\\1', pattern)
which is fast (faster than existing code) for str and slow for bytes patterns.
I don't particularly like it, because of the difference between str and bytes handling, but I do think that it will be much easier to "fix" once/when/if re module is improved. |
|
Date |
User |
Action |
Args |
2011-03-14 14:46:48 | SilentGhost | set | recipients:
+ SilentGhost, georg.brandl, amaury.forgeotdarc, belopolsky, foom, pitrou, rsc, timehorse, benjamin.peterson, zanella, donlorenzo, ezio.melotti, bjourne, mortenlj, mrabarnett, swamiyeswanth |
2011-03-14 14:46:48 | SilentGhost | set | messageid: <1300114008.2.0.793469689705.issue2650@psf.upfronthosting.co.za> |
2011-03-14 14:46:47 | SilentGhost | link | issue2650 messages |
2011-03-14 14:46:47 | SilentGhost | create | |
|