Message92548
In my app, I need to transform the regexp created from user input so
that it matches unicode characters with their ascii equivalents. For
example, if someone searches for "el nino", that should match the string
"el ñino". Similarly, searching for "el ñino" should match "el nino".
The code to transform the regexp looks like this:
s = re.escape(user_input)
s = re.sub(u'n|ñ', u'[n|ñ]')
matches = list(re.findall(s, data, re.IGNORECASE|re.UNICODE))
It doesn't work because the ñ in the user_input is escaped with a
backslash. My workaround, to compensate for re.escape's to eager
escaping, is to escape re.sub pattern:
s = re.sub(u'\\\\n|\\\\ñ', u'[\\\\n|\\\\ñ]')
It works but is not very nice. It would have been much better if
re.escape worked like one could expect in the first place. |
|
Date |
User |
Action |
Args |
2009-09-12 16:21:14 | bjourne | set | recipients:
+ bjourne, akuchling, amaury.forgeotdarc, belopolsky, pitrou, rsc, timehorse, benjamin.peterson, zanella, donlorenzo, ezio.melotti, mortenlj |
2009-09-12 16:21:14 | bjourne | set | messageid: <1252772474.1.0.892937387078.issue2650@psf.upfronthosting.co.za> |
2009-09-12 16:21:12 | bjourne | link | issue2650 messages |
2009-09-12 16:21:12 | bjourne | create | |
|