Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make RE "a", "L" and "u" inline flags local #75871

Closed
serhiy-storchaka opened this issue Oct 4, 2017 · 4 comments
Closed

Make RE "a", "L" and "u" inline flags local #75871

serhiy-storchaka opened this issue Oct 4, 2017 · 4 comments
Assignees
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir topic-regex type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 31690
Nosy @warsaw, @ezio-melotti, @serhiy-storchaka
PRs
  • bpo-31672: string: Use re.A | re.I flag for identifier pattern #3872
  • bpo-31690: Make "a", "L" and "u" inline flags in regular expressions local. #3885
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2017-10-24.20:33:52.493>
    created_at = <Date 2017-10-04.14:02:56.019>
    labels = ['expert-regex', 'type-feature', 'library', '3.7']
    title = 'Make RE "a", "L" and "u" inline flags local'
    updated_at = <Date 2017-10-24.20:33:52.492>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2017-10-24.20:33:52.492>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2017-10-24.20:33:52.493>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)', 'Regular Expressions']
    creation = <Date 2017-10-04.14:02:56.019>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 31690
    keywords = ['patch']
    message_count = 4.0
    messages = ['303693', '303712', '303759', '304939']
    nosy_count = 4.0
    nosy_names = ['barry', 'ezio.melotti', 'mrabarnett', 'serhiy.storchaka']
    pr_nums = ['3872', '3885']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue31690'
    versions = ['Python 3.7']

    @serhiy-storchaka
    Copy link
    Member Author

    Currently re supports local inline flags. 'a(?i:b)' matches 'a' cases-sensitively, but 'b' case-insensitively. But flags 'a' and 'L' can't be scoped to a subpattern. The 'u' flag currently just redundant, it doesn't make effect in string patterns, and is not allowed in bytes patterns. They can be applied only to the whole pattern. I think it would be nice to make them local.

    The example of the problem that this can solve is bpo-31672. Currently '[a-z]' in Unicode case-insensitive mode matches not only Latin letters from ;a' to 'z' and from 'A' to 'Z', but also characters 'İ', 'ı', 'ſ' and 'K' which are equivalent to 'i', 's' and 'k' correspondingly. With local 'a' and 'u' flags you can use ASCII and Unicode ranges in the same pattern.

    I'm working on the patch.

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Oct 4, 2017
    @serhiy-storchaka serhiy-storchaka self-assigned this Oct 4, 2017
    @serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir topic-regex type-feature A feature request or enhancement labels Oct 4, 2017
    @serhiy-storchaka
    Copy link
    Member Author

    PR 3885 is a preliminary but working implementation. Needed new tests and documentation.

    >>> import re
    >>> re.findall('(?i:[a-z]+)', ''.join(map(chr, range(0x10000))))
    ['ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz', 'İı', 'ſ', 'K']
    >>> re.findall('(?ia:[a-z]+)', ''.join(map(chr, range(0x10000))))
    ['ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz']

    The engine now uses separate opcodes for case-insensitive matching in ASCII, UNICODE and LOCALE modes. It may cause small speed up of matching, but slow down of compiling.

    @serhiy-storchaka
    Copy link
    Member Author

    Added tests and the documentation.

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 3557b05 by Serhiy Storchaka in branch 'master':
    bpo-31690: Allow the inline flags "a", "L", and "u" to be used as group flags for RE. (bpo-3885)
    3557b05

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir topic-regex type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant