classification
Title: Make RE "a", "L" and "u" inline flags local
Type: enhancement Stage: patch review
Components: Library (Lib), Regular Expressions Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: barry, ezio.melotti, mrabarnett, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-10-04 14:02 by serhiy.storchaka, last changed 2017-10-05 12:16 by serhiy.storchaka.

Pull Requests
URL Status Linked Edit
PR 3872 inada.naoki, 2017-10-04 14:11
PR 3885 open serhiy.storchaka, 2017-10-04 16:38
Messages (3)
msg303693 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-04 14:02
Currently re supports local inline flags. 'a(?i:b)' matches 'a' cases-sensitively, but 'b' case-insensitively. But flags 'a' and 'L' can't be scoped to a subpattern. The 'u' flag currently just redundant, it doesn't make effect in string patterns, and is not allowed in bytes patterns. They can be applied only to the whole pattern. I think it would be nice to make them local.

The example of the problem that this can solve is issue31672. Currently '[a-z]' in Unicode case-insensitive mode matches not only Latin letters from ;a' to 'z' and from 'A' to 'Z', but also characters 'İ', 'ı', 'ſ' and 'K' which are equivalent to 'i', 's' and 'k' correspondingly. With local 'a' and 'u' flags you can use ASCII and Unicode ranges in the same pattern.

I'm working on the patch.
msg303712 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-04 16:54
PR 3885 is a preliminary but working implementation. Needed new tests and documentation.

>>> import re
>>> re.findall('(?i:[a-z]+)', ''.join(map(chr, range(0x10000))))
['ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz', 'İı', 'ſ', 'K']
>>> re.findall('(?ia:[a-z]+)', ''.join(map(chr, range(0x10000))))
['ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz']

The engine now uses separate opcodes for case-insensitive matching in ASCII, UNICODE and LOCALE modes. It may cause small speed up of matching, but slow down of compiling.
msg303759 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-05 12:16
Added tests and the documentation.
History
Date User Action Args
2017-10-05 12:16:42serhiy.storchakasetmessages: + msg303759
2017-10-04 16:54:06serhiy.storchakasetmessages: + msg303712
2017-10-04 16:38:23serhiy.storchakasetpull_requests: + pull_request3860
2017-10-04 14:11:55inada.naokisetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request3858
2017-10-04 14:03:47barrysetnosy: + barry
2017-10-04 14:02:56serhiy.storchakacreate