classification
Title: Regular Expression inline flags not handled correctly for some unicode characters
Type: behavior Stage:
Components: Regular Expressions Versions: Python 3.0, Python 2.4, Python 2.3, Python 2.6, Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: effbot Nosy List: effbot, gvanrossum, sonnq
Priority: normal Keywords:

Created on 2007-12-25 17:08 by sonnq, last changed 2008-01-03 19:13 by gvanrossum. This issue is now closed.

Files
File name Uploaded Description Edit
re_unicode_flag.py sonnq, 2007-12-25 17:08
Messages (4)
msg58993 - (view) Author: Nguyen Quan Son (sonnq) Date: 2007-12-25 17:08
There is an inconsistency in handling RE inline flags ( e.g. '(?iu)' ) 
when pattern consists of some unicode characters, for example 
characters in range from '\u1ea0' to '\u1ef9'.

Please see code attached for a demonstration of the problem.
msg59081 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-02 02:48
I see this too.  Maybe Fredrik understands?
msg59092 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2008-01-02 11:17
Looks like the wrong execution flags are being passed to the function
that creates the actual pattern object; the SRE compiler does the right
thing, but the engine isn't running with the right flags in the last
case.  Changing the call to _sre.compile in sre_compile.py to:

    return _sre.compile(
        pattern, flags | p.pattern.flags, code,
        p.pattern.groups-1,
        groupindex, indexgroup
        )

should do the trick, I think.  (got no time to fix my broken Python SVN
setup right now, but if someone wants to verify this and add the
necessary tests to the test suite, be my guest).
msg59143 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-03 19:13
Committed revision 59674.  (2.5.2 branch)
Committed revision 59675.  (2.6 trunk)
History
Date User Action Args
2008-01-03 19:13:07gvanrossumsetstatus: open -> closed
resolution: fixed
messages: + msg59143
versions: + Python 2.6, Python 3.0
2008-01-02 11:17:44effbotsetmessages: + msg59092
2008-01-02 02:48:42gvanrossumsetassignee: effbot
messages: + msg59081
nosy: + effbot, gvanrossum
2007-12-25 17:08:04sonnqcreate