This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Make RE "a", "L" and "u" inline flags local
Type: enhancement Stage: resolved
Components: Library (Lib), Regular Expressions Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: barry, ezio.melotti, mrabarnett, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-10-04 14:02 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 3872 methane, 2017-10-04 14:11
PR 3885 merged serhiy.storchaka, 2017-10-04 16:38
Messages (4)
msg303693 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-04 14:02
Currently re supports local inline flags. 'a(?i:b)' matches 'a' cases-sensitively, but 'b' case-insensitively. But flags 'a' and 'L' can't be scoped to a subpattern. The 'u' flag currently just redundant, it doesn't make effect in string patterns, and is not allowed in bytes patterns. They can be applied only to the whole pattern. I think it would be nice to make them local.

The example of the problem that this can solve is issue31672. Currently '[a-z]' in Unicode case-insensitive mode matches not only Latin letters from ;a' to 'z' and from 'A' to 'Z', but also characters 'İ', 'ı', 'ſ' and 'K' which are equivalent to 'i', 's' and 'k' correspondingly. With local 'a' and 'u' flags you can use ASCII and Unicode ranges in the same pattern.

I'm working on the patch.
msg303712 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-04 16:54
PR 3885 is a preliminary but working implementation. Needed new tests and documentation.

>>> import re
>>> re.findall('(?i:[a-z]+)', ''.join(map(chr, range(0x10000))))
['ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz', 'İı', 'ſ', 'K']
>>> re.findall('(?ia:[a-z]+)', ''.join(map(chr, range(0x10000))))
['ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz']

The engine now uses separate opcodes for case-insensitive matching in ASCII, UNICODE and LOCALE modes. It may cause small speed up of matching, but slow down of compiling.
msg303759 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-05 12:16
Added tests and the documentation.
msg304939 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-24 20:31
New changeset 3557b05c5a7dfd7d97ddfd3b79aefd53d25e5132 by Serhiy Storchaka in branch 'master':
bpo-31690: Allow the inline flags "a", "L", and "u" to be used as group flags for RE. (#3885)
https://github.com/python/cpython/commit/3557b05c5a7dfd7d97ddfd3b79aefd53d25e5132
History
Date User Action Args
2022-04-11 14:58:53adminsetgithub: 75871
2017-10-24 20:33:52serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-10-24 20:31:44serhiy.storchakasetmessages: + msg304939
2017-10-05 12:16:42serhiy.storchakasetmessages: + msg303759
2017-10-04 16:54:06serhiy.storchakasetmessages: + msg303712
2017-10-04 16:38:23serhiy.storchakasetpull_requests: + pull_request3860
2017-10-04 14:11:55methanesetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request3858
2017-10-04 14:03:47barrysetnosy: + barry
2017-10-04 14:02:56serhiy.storchakacreate