Message 92548 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	bjourne
Recipients	akuchling, amaury.forgeotdarc, belopolsky, benjamin.peterson, bjourne, donlorenzo, ezio.melotti, mortenlj, pitrou, rsc, timehorse, zanella
Date	2009-09-12.16:21:12
SpamBayes Score	0.00027554244
Marked as misclassified	No
Message-id	<1252772474.1.0.892937387078.issue2650@psf.upfronthosting.co.za>
In-reply-to

Content
In my app, I need to transform the regexp created from user input so that it matches unicode characters with their ascii equivalents. For example, if someone searches for "el nino", that should match the string "el ñino". Similarly, searching for "el ñino" should match "el nino". The code to transform the regexp looks like this: s = re.escape(user_input) s = re.sub(u'n\|ñ', u'[n\|ñ]') matches = list(re.findall(s, data, re.IGNORECASE\|re.UNICODE)) It doesn't work because the ñ in the user_input is escaped with a backslash. My workaround, to compensate for re.escape's to eager escaping, is to escape re.sub pattern: s = re.sub(u'\\\\n\|\\\\ñ', u'[\\\\n\|\\\\ñ]') It works but is not very nice. It would have been much better if re.escape worked like one could expect in the first place.

In my app, I need to transform the regexp created from user input so
that it matches unicode characters with their ascii equivalents. For
example, if someone searches for "el nino", that should match the string
"el ñino". Similarly, searching for "el ñino" should match "el nino".

The code to transform the regexp looks like this:

s = re.escape(user_input)
s = re.sub(u'n|ñ', u'[n|ñ]')
matches = list(re.findall(s, data, re.IGNORECASE|re.UNICODE))

It doesn't work because the ñ in the user_input is escaped with a
backslash. My workaround, to compensate for re.escape's to eager
escaping, is to escape re.sub pattern:

s = re.sub(u'\\\\n|\\\\ñ', u'[\\\\n|\\\\ñ]')

It works but is not very nice. It would have been much better if
re.escape worked like one could expect in the first place.

History
Date	User	Action	Args
2009-09-12 16:21:14	bjourne	set	recipients: + bjourne, akuchling, amaury.forgeotdarc, belopolsky, pitrou, rsc, timehorse, benjamin.peterson, zanella, donlorenzo, ezio.melotti, mortenlj
2009-09-12 16:21:14	bjourne	set	messageid: <1252772474.1.0.892937387078.issue2650@psf.upfronthosting.co.za>
2009-09-12 16:21:12	bjourne	link	issue2650 messages
2009-09-12 16:21:12	bjourne	create