Title: Regular Expression inline flags not handled correctly for some unicode characters
Components: Regular Expressions Versions: Python 3.0, Python 2.4, Python 2.3, Python 2.6, Python 2.5
Status: closed Resolution: fixed
Assigned To: effbot Nosy List: effbot, gvanrossum, sonnq
Created on 2007-12-25 17:08 by sonnq, last changed 2008-01-03 19:13 by gvanrossum. This issue is now closed.

msg58993 - (view) Author: Nguyen Quan Son (sonnq) Date: 2007-12-25 17:08
There is an inconsistency in handling RE inline flags ( e.g. '(?iu)' ) 
when pattern consists of some unicode characters, for example 
characters in range from '\u1ea0' to '\u1ef9'.

Please see code attached for a demonstration of the problem.
msg59081 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-02 02:48
I see this too.  Maybe Fredrik understands?
msg59092 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2008-01-02 11:17
Looks like the wrong execution flags are being passed to the function
that creates the actual pattern object; the SRE compiler does the right
thing, but the engine isn't running with the right flags in the last
case.  Changing the call to _sre.compile in to:

    return _sre.compile(
        pattern, flags | p.pattern.flags, code,
        groupindex, indexgroup

should do the trick, I think.  (got no time to fix my broken Python SVN
setup right now, but if someone wants to verify this and add the
necessary tests to the test suite, be my guest).
msg59143 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-03 19:13
Committed revision 59674.  (2.5.2 branch)
Committed revision 59675.  (2.6 trunk)
