Note

For testing and comparison with the current 're' module the new implementation is in the form of a module called 'regex'.

Please note that certain aspects of this module have changed from those of previous versions to make it conform better to the 're' module.

Building for 64-bits

If the source files are built for a 64-bit target then the string positions will also be 64-bit. (The 're' module appears to limit string positions to 32 bits, even on a 64-bit build.)

Flags

There are 2 kinds of flag: scoped and global. Scoped flags can apply to only part of a pattern and can be turned on or off; global flags apply to the entire pattern and can only be turned on.

The scoped flags are: IGNORECASE, MULTILINE, DOTALL, VERBOSE, WORD.

The global flags are: ASCII, LOCALE, NEW, REVERSE, UNICODE.

The NEW flag turns on the new behaviour of this module, which can differ from that of the 're' module, such as splitting on zero-width matches, inline flags affecting only what follows, and being able to turn inline flags off.

Note: The ZEROWIDTH flag which was in previous versions of this module has been removed. The NEW flag should be used instead.

Notes on named capture groups

All capture groups have a group number, starting from 1.

Groups with the same group name will have the same group number, and groups with a different group name will have a different group number.

The same group name can be used on different branches of an alternation because they are mutually exclusive, eg. (?P<foo>first)|(?P<foo>second). They will, of course, have the same group number.

Group numbers will be reused, where possible, across different branches of a branch reset, eg. (?|(first)|(second)) has only group 1. If capture groups have different group names then they will, of course, have different group numbers, eg. (?|(?P<foo>first)|(?P<bar>second)) has group 1 ("foo") and group 2 ("bar").

Unicode

This module supports Unicode 6.0.0.

Additional features