classification
Title: Deprecate the use of flags not at the start of regular expression
Type: enhancement Stage: resolved
Components: Library (Lib), Regular Expressions Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Tim.Graham, ezio.melotti, mrabarnett, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-09-25 10:29 by serhiy.storchaka, last changed 2016-10-27 19:50 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
re_deprecate_nonstart_flags.patch serhiy.storchaka, 2014-09-25 10:28 review
re_deprecate_nested_flags.patch serhiy.storchaka, 2014-10-09 08:51 review
re_deprecate_nonstart_flags2.patch serhiy.storchaka, 2016-09-10 08:42 review
better-warning.diff Tim.Graham, 2016-09-15 14:41 review
better-warning-2.diff Tim.Graham, 2016-09-16 00:18 review
Messages (19)
msg227520 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-25 10:28
The meaning of inline flags not at the start of regular expression is ambiguous. Current re implementation and regex in the V0 mode enlarge the scope to all expression. In V1 mode in regex they affect only the end of the expression.

I propose to deprecate (and then forbid in 3.7) the use of inline flags not at the start of regular expression. This will help to change the meaning of inline flags in the middle of the expression in future (in 3.8 or later).
msg228841 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-09 08:51
Here is alternative, much simpler, patch. It deprecates only flags in nested subpatterns. No changes needed in tests and other stdlib modules. It is very unlikely that it is used in third party code.
msg228930 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-09 22:29
> Here is alternative, much simpler, patch. It deprecates only flags in nested subpatterns.

That sounds a bit random. It wouldn't totally address the discrepancy with regex, would it?
MRAB, what do you think on this topic?
msg228931 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2014-10-09 23:29
I think the simplest and clearest approach from the user's point of view is just to deprecate inline flags that are not at the start of the pattern.

In practice, they almost invariably occur at the start anyway, although I do remember once seeing a pattern in which the inline flag was at the end!
msg228959 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-10 07:30
> That sounds a bit random. It wouldn't totally address the discrepancy with regex, would it?

No, it will not totally address the discrepancy with regex, but at least it will allow as to change the behavior of flags in subpatterns. And we always can convert a pattern to a subpattern (surround by "(?:" and ")").

For now Python re module is only one regular expression implementation in which flags in the middle of the expression affect all expression. [1]

[1] http://www.regular-expressions.info/modifiers.html
msg275601 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-10 08:42
Updated patch that deprecates flags not at the start. fnmatch.translate() now uses scoped flags (issue433028).
msg275758 - (view) Author: Roundup Robot (python-dev) Date: 2016-09-11 09:50
New changeset 31f8af1c3567 by Serhiy Storchaka in branch 'default':
Issue #22493: Inline flags now should be used only at the start of the
https://hg.python.org/cpython/rev/31f8af1c3567
msg276559 - (view) Author: Tim Graham (Tim.Graham) * Date: 2016-09-15 14:41
Could we include the offending pattern in the deprecation message? I'm attaching a proposed patch. With that patch I can more easily find the offending pattern, whereas before I had no idea:

django/django/urls/resolvers.py:101: DeprecationWarning: Flags not at the start of the expression ^(?i)test/2/?$
  compiled_regex = re.compile(regex, re.UNICODE)
msg276562 - (view) Author: Tim Graham (Tim.Graham) * Date: 2016-09-15 15:00
And on further investigation, I'm not sure how to fix the deprecation warnings in Django. We have a urlpattern like this:

  url(r'^(?i)CaseInsensitive/(\w+)', empty_view, name="insensitive"),

The regex string r'^(?i)CaseInsensitive/(\w+)' is later substituted in this line in Django's URL resolver as the `pattern`:

if re.search('^%s%s' % (re.escape(_prefix), pattern), candidate_pat % candidate_subs, re.UNICODE):

It seems Django would need to extract any flags from `pattern` and put them at the start of the '^%s%s' string that's constructed for re.search(). I'm not sure if this can be done easily.
msg276566 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2016-09-15 15:33
@Tim: Why are you using re.search with '^'? Does the pattern that's passed in ever contain '(?m)'? If not, re.match without '^' is better.
msg276583 - (view) Author: Tim Graham (Tim.Graham) * Date: 2016-09-15 18:01
Looks like we could remove the '^', but it doesn't resolve the deprecation warnings. The inline flags in `pattern` still need to be moved before `_prefix`.
msg276620 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-15 21:00
Thanks Tim, this is great idea! I consider this as usability bug and going to apply a fix to 3.6.

But regular expression can be generated and be very long. I think it should be truncated before including in a warning message.

As for Django, you can use (?i:CaseInsensitive) in 3.7 (unless _prefix is case sensitive, but you want to make it case sensitive if pattern is case sensitive).
msg276621 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-15 21:05
In tests you can either add re.escape(), or escape special characters manually (r'\(\?i\)'). What you prefer.
msg276648 - (view) Author: Tim Graham (Tim.Graham) * Date: 2016-09-16 00:18
Adding an updated patch.

I guess the (?i:CaseInsensitive) syntax isn't merged yet? I tried it but it didn't work. It might be premature to proceed with this deprecation if that alternative isn't already present. Is there an issue for it?
msg276650 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2016-09-16 00:58
I downloaded Python 3.6.0b1 not long after it was released and it works for me:

>>> re.match('(?i:CaseInsensitive)', 'caseinsensitive')
<_sre.SRE_Match object; span=(0, 15), match='caseinsensitive'>
msg276746 - (view) Author: Tim Graham (Tim.Graham) * Date: 2016-09-16 20:24
Yes, I found that Django needs an update to support that syntax in URLpatterns. Thanks.
msg276757 - (view) Author: Roundup Robot (python-dev) Date: 2016-09-16 22:31
New changeset c35a528268fd by Serhiy Storchaka in branch '3.6':
Issue #22493: Warning message emitted by using inline flags in the middle of
https://hg.python.org/cpython/rev/c35a528268fd

New changeset 9d0f4da4d531 by Serhiy Storchaka in branch 'default':
Issue #22493: Warning message emitted by using inline flags in the middle of
https://hg.python.org/cpython/rev/9d0f4da4d531
msg276758 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-16 22:33
Patch LGTM (but I changed tests a little). Thanks Tim!
msg279567 - (view) Author: Roundup Robot (python-dev) Date: 2016-10-27 19:50
New changeset c04a56b3a4f2 by Serhiy Storchaka in branch '3.6':
Issue #22493: Updated an example for fnmatch.translate().
https://hg.python.org/cpython/rev/c04a56b3a4f2

New changeset ded9a3c3bbb6 by Serhiy Storchaka in branch 'default':
Issue #22493: Updated an example for fnmatch.translate().
https://hg.python.org/cpython/rev/ded9a3c3bbb6
History
Date User Action Args
2016-10-27 19:50:50python-devsetmessages: + msg279567
2016-09-16 22:33:54serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg276758

stage: patch review -> resolved
2016-09-16 22:31:10python-devsetmessages: + msg276757
2016-09-16 20:24:15Tim.Grahamsetmessages: + msg276746
2016-09-16 00:58:24mrabarnettsetmessages: + msg276650
2016-09-16 00:18:12Tim.Grahamsetfiles: + better-warning-2.diff

messages: + msg276648
2016-09-15 21:05:02serhiy.storchakasetmessages: + msg276621
2016-09-15 21:00:47serhiy.storchakasetmessages: + msg276620
2016-09-15 20:41:51serhiy.storchakasetstatus: closed -> open
resolution: fixed -> (no value)
stage: resolved -> patch review
2016-09-15 18:01:28Tim.Grahamsetmessages: + msg276583
2016-09-15 15:33:40mrabarnettsetmessages: + msg276566
2016-09-15 15:00:17Tim.Grahamsetmessages: + msg276562
2016-09-15 14:41:37Tim.Grahamsetfiles: + better-warning.diff
nosy: + Tim.Graham
messages: + msg276559

2016-09-13 06:29:01serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2016-09-11 09:50:22python-devsetnosy: + python-dev
messages: + msg275758
2016-09-10 08:42:22serhiy.storchakasetfiles: + re_deprecate_nonstart_flags2.patch

messages: + msg275601
2016-09-07 14:58:35serhiy.storchakasetassignee: serhiy.storchaka
versions: + Python 3.6, - Python 3.5
2014-10-10 07:30:30serhiy.storchakasetmessages: + msg228959
2014-10-09 23:29:13mrabarnettsetmessages: + msg228931
2014-10-09 22:29:20pitrousetmessages: + msg228930
2014-10-09 08:51:18serhiy.storchakasetfiles: + re_deprecate_nested_flags.patch

messages: + msg228841
2014-09-25 10:29:00serhiy.storchakacreate