Author pitrou
Recipients gvanrossum, humitos, pitrou, sven.siegmund
Date 2008-06-28.20:27:23
SpamBayes Score 0.00419008
Marked as misclassified No
Message-id <1214684844.73.0.533760513068.issue2834@psf.upfronthosting.co.za>
In-reply-to
Content
Uh, actually, it works if you specify re.UNICODE. If you don't, the
getlower() function in _sre.c falls back to the plain ASCII algorithm.

>>> pat = re.compile('Á', re.IGNORECASE | re.UNICODE)
>>> pat.match('á')
<_sre.SRE_Match object at 0xb7c66c28>
>>> pat.match('Á')
<_sre.SRE_Match object at 0xb7c66cd0>

I wonder if re.UNICODE shouldn't be the default in Py3k, at least when
the pattern is a string and not a bytes object. There may also be a
re.ASCII flag for those cases where people want to fallback to the old
behaviour.
History
Date User Action Args
2008-06-28 20:27:24pitrousetspambayes_score: 0.00419008 -> 0.00419008
recipients: + pitrou, gvanrossum, humitos, sven.siegmund
2008-06-28 20:27:24pitrousetspambayes_score: 0.00419008 -> 0.00419008
messageid: <1214684844.73.0.533760513068.issue2834@psf.upfronthosting.co.za>
2008-06-28 20:27:24pitroulinkissue2834 messages
2008-06-28 20:27:23pitroucreate