Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate the use of flags not at the start of regular expression #66683

Closed
serhiy-storchaka opened this issue Sep 25, 2014 · 21 comments
Closed
Assignees
Labels
stdlib Python modules in the Lib dir topic-regex type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 22493
Nosy @pitrou, @ezio-melotti, @serhiy-storchaka, @timgraham
PRs
  • [Do Not Merge] Convert Misc/NEWS so that it is managed by towncrier #552
  • Files
  • re_deprecate_nonstart_flags.patch
  • re_deprecate_nested_flags.patch
  • re_deprecate_nonstart_flags2.patch
  • better-warning.diff
  • better-warning-2.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2016-09-16.22:33:54.254>
    created_at = <Date 2014-09-25.10:29:00.775>
    labels = ['expert-regex', 'type-feature', 'library']
    title = 'Deprecate the use of flags not at the start of regular expression'
    updated_at = <Date 2018-05-27.08:53:56.829>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2018-05-27.08:53:56.829>
    actor = 'aleskva'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2016-09-16.22:33:54.254>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)', 'Regular Expressions']
    creation = <Date 2014-09-25.10:29:00.775>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['36721', '36845', '44526', '44675', '44680']
    hgrepos = []
    issue_num = 22493
    keywords = ['patch']
    message_count = 21.0
    messages = ['227520', '228841', '228930', '228931', '228959', '275601', '275758', '276559', '276562', '276566', '276583', '276620', '276621', '276648', '276650', '276746', '276757', '276758', '279567', '317786', '317788']
    nosy_count = 7.0
    nosy_names = ['pitrou', 'ezio.melotti', 'mrabarnett', 'python-dev', 'serhiy.storchaka', 'Tim.Graham', 'aleskva']
    pr_nums = ['552']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue22493'
    versions = ['Python 3.6']

    @serhiy-storchaka
    Copy link
    Member Author

    The meaning of inline flags not at the start of regular expression is ambiguous. Current re implementation and regex in the V0 mode enlarge the scope to all expression. In V1 mode in regex they affect only the end of the expression.

    I propose to deprecate (and then forbid in 3.7) the use of inline flags not at the start of regular expression. This will help to change the meaning of inline flags in the middle of the expression in future (in 3.8 or later).

    @serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir topic-regex type-feature A feature request or enhancement labels Sep 25, 2014
    @serhiy-storchaka
    Copy link
    Member Author

    Here is alternative, much simpler, patch. It deprecates only flags in nested subpatterns. No changes needed in tests and other stdlib modules. It is very unlikely that it is used in third party code.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 9, 2014

    Here is alternative, much simpler, patch. It deprecates only flags in nested subpatterns.

    That sounds a bit random. It wouldn't totally address the discrepancy with regex, would it?
    MRAB, what do you think on this topic?

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Oct 9, 2014

    I think the simplest and clearest approach from the user's point of view is just to deprecate inline flags that are not at the start of the pattern.

    In practice, they almost invariably occur at the start anyway, although I do remember once seeing a pattern in which the inline flag was at the end!

    @serhiy-storchaka
    Copy link
    Member Author

    That sounds a bit random. It wouldn't totally address the discrepancy with regex, would it?

    No, it will not totally address the discrepancy with regex, but at least it will allow as to change the behavior of flags in subpatterns. And we always can convert a pattern to a subpattern (surround by "(?:" and ")").

    For now Python re module is only one regular expression implementation in which flags in the middle of the expression affect all expression. [1]

    [1] http://www.regular-expressions.info/modifiers.html

    @serhiy-storchaka serhiy-storchaka self-assigned this Sep 7, 2016
    @serhiy-storchaka
    Copy link
    Member Author

    Updated patch that deprecates flags not at the start. fnmatch.translate() now uses scoped flags (bpo-433028).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 11, 2016

    New changeset 31f8af1c3567 by Serhiy Storchaka in branch 'default':
    Issue bpo-22493: Inline flags now should be used only at the start of the
    https://hg.python.org/cpython/rev/31f8af1c3567

    @timgraham
    Copy link
    Mannequin

    timgraham mannequin commented Sep 15, 2016

    Could we include the offending pattern in the deprecation message? I'm attaching a proposed patch. With that patch I can more easily find the offending pattern, whereas before I had no idea:

    django/django/urls/resolvers.py:101: DeprecationWarning: Flags not at the start of the expression ^(?i)test/2/?$
    compiled_regex = re.compile(regex, re.UNICODE)

    @timgraham
    Copy link
    Mannequin

    timgraham mannequin commented Sep 15, 2016

    And on further investigation, I'm not sure how to fix the deprecation warnings in Django. We have a urlpattern like this:

    url(r'^(?i)CaseInsensitive/(\w+)', empty_view, name="insensitive"),

    The regex string r'^(?i)CaseInsensitive/(\w+)' is later substituted in this line in Django's URL resolver as the pattern:

    if re.search('^%s%s' % (re.escape(_prefix), pattern), candidate_pat % candidate_subs, re.UNICODE):

    It seems Django would need to extract any flags from pattern and put them at the start of the '^%s%s' string that's constructed for re.search(). I'm not sure if this can be done easily.

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Sep 15, 2016

    @tim: Why are you using re.search with '^'? Does the pattern that's passed in ever contain '(?m)'? If not, re.match without '^' is better.

    @timgraham
    Copy link
    Mannequin

    timgraham mannequin commented Sep 15, 2016

    Looks like we could remove the '^', but it doesn't resolve the deprecation warnings. The inline flags in pattern still need to be moved before _prefix.

    @serhiy-storchaka
    Copy link
    Member Author

    Thanks Tim, this is great idea! I consider this as usability bug and going to apply a fix to 3.6.

    But regular expression can be generated and be very long. I think it should be truncated before including in a warning message.

    As for Django, you can use (?i:CaseInsensitive) in 3.7 (unless _prefix is case sensitive, but you want to make it case sensitive if pattern is case sensitive).

    @serhiy-storchaka
    Copy link
    Member Author

    In tests you can either add re.escape(), or escape special characters manually (r'\(\?i\)'). What you prefer.

    @timgraham
    Copy link
    Mannequin

    timgraham mannequin commented Sep 16, 2016

    Adding an updated patch.

    I guess the (?i:CaseInsensitive) syntax isn't merged yet? I tried it but it didn't work. It might be premature to proceed with this deprecation if that alternative isn't already present. Is there an issue for it?

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Sep 16, 2016

    I downloaded Python 3.6.0b1 not long after it was released and it works for me:

    >>> re.match('(?i:CaseInsensitive)', 'caseinsensitive')
    <_sre.SRE_Match object; span=(0, 15), match='caseinsensitive'>

    @timgraham
    Copy link
    Mannequin

    timgraham mannequin commented Sep 16, 2016

    Yes, I found that Django needs an update to support that syntax in URLpatterns. Thanks.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 16, 2016

    New changeset c35a528268fd by Serhiy Storchaka in branch '3.6':
    Issue bpo-22493: Warning message emitted by using inline flags in the middle of
    https://hg.python.org/cpython/rev/c35a528268fd

    New changeset 9d0f4da4d531 by Serhiy Storchaka in branch 'default':
    Issue bpo-22493: Warning message emitted by using inline flags in the middle of
    https://hg.python.org/cpython/rev/9d0f4da4d531

    @serhiy-storchaka
    Copy link
    Member Author

    Patch LGTM (but I changed tests a little). Thanks Tim!

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 27, 2016

    New changeset c04a56b3a4f2 by Serhiy Storchaka in branch '3.6':
    Issue bpo-22493: Updated an example for fnmatch.translate().
    https://hg.python.org/cpython/rev/c04a56b3a4f2

    New changeset ded9a3c3bbb6 by Serhiy Storchaka in branch 'default':
    Issue bpo-22493: Updated an example for fnmatch.translate().
    https://hg.python.org/cpython/rev/ded9a3c3bbb6

    @aleskva
    Copy link
    Mannequin

    aleskva mannequin commented May 27, 2018

    Maybe there should be introduced some method to merge patterns as just piping patterns will not work:
    pats = [r'(?m)^line.continues$', r'(?s)begin.*?end']
    re.compile('|'.join(pats))

    @aleskva
    Copy link
    Mannequin

    aleskva mannequin commented May 27, 2018

    See also https://bugs.python.org/issue33658

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir topic-regex type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants