Author bjourne
Recipients akuchling, amaury.forgeotdarc, belopolsky, benjamin.peterson, bjourne, donlorenzo, ezio.melotti, mortenlj, pitrou, rsc, timehorse, zanella
Date 2009-09-12.16:21:12
SpamBayes Score 0.000275542
Marked as misclassified No
Message-id <1252772474.1.0.892937387078.issue2650@psf.upfronthosting.co.za>
In-reply-to
Content
In my app, I need to transform the regexp created from user input so
that it matches unicode characters with their ascii equivalents. For
example, if someone searches for "el nino", that should match the string
"el ñino". Similarly, searching for "el ñino" should match "el nino".

The code to transform the regexp looks like this:

s = re.escape(user_input)
s = re.sub(u'n|ñ', u'[n|ñ]')
matches = list(re.findall(s, data, re.IGNORECASE|re.UNICODE))

It doesn't work because the ñ in the user_input is escaped with a
backslash. My workaround, to compensate for re.escape's to eager
escaping, is to escape re.sub pattern:

s = re.sub(u'\\\\n|\\\\ñ', u'[\\\\n|\\\\ñ]')

It works but is not very nice. It would have been much better if
re.escape worked like one could expect in the first place.
History
Date User Action Args
2009-09-12 16:21:14bjournesetrecipients: + bjourne, akuchling, amaury.forgeotdarc, belopolsky, pitrou, rsc, timehorse, benjamin.peterson, zanella, donlorenzo, ezio.melotti, mortenlj
2009-09-12 16:21:14bjournesetmessageid: <1252772474.1.0.892937387078.issue2650@psf.upfronthosting.co.za>
2009-09-12 16:21:12bjournelinkissue2650 messages
2009-09-12 16:21:12bjournecreate