This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re: DeprecationWarning for `flag not at the start of expression` is cutoff too early
Type: enhancement Stage: resolved
Components: Regular Expressions Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Mark.Shannon, ezio.melotti, jugmac00, miss-islington, mrabarnett, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2020-01-20 10:44 by jugmac00, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 31988 merged serhiy.storchaka, 2022-03-19 11:40
PR 31989 merged miss-islington, 2022-03-19 12:13
PR 31990 merged miss-islington, 2022-03-19 12:13
Messages (8)
msg360306 - (view) Author: Jürgen Gmach (jugmac00) * Date: 2020-01-20 10:44
The usage of flags not at the start of an expression is deprecated.

Also see "Deprecate the use of flags not at the start of regular expression" / https://bugs.python.org/issue22493 

A deprecation warning is issued, but is cutoff at 20 characters.

For complex expressions this is way too small.

Example ( https://github.com/jedie/python-creole/issues/31 ):

current output

/home/jugmac00/Projects/bliss_deployment/work/_/home/jugmac00/.batou-shared-eggs/python_creole-1.3.2-py3.7.egg/creole/parser/creol2html_parser.py:48
  /home/jugmac00/Projects/bliss_deployment/work/_/home/jugmac00/.batou-shared-eggs/python_creole-1.3.2-py3.7.egg/creole/parser/creol2html_parser.py:48: DeprecationWarning: Flags not at the start of the expression '(?P<image>\n         ' (truncated)
    re.VERBOSE | re.UNICODE


output with patched sre_parse.py

creole/parser/creol2html_parser.py:51
  /home/jugmac00/Projects/python-creole/creole/parser/creol2html_parser.py:51: DeprecationWarning: Flags not at the start of the expression '\n            \\| \\s*\n            (\n                (?P<head> [=][^|]+ ) |\n                (?P<cell> (  (?P<link>\n            \\[\\[\n            (?P<link_target>.+?) \\s*\n            ([|] \\s* (?P<link_text>.+?) \\s*)?\n            ]]\n        )|\n        (?P<macro_inline>\n        << \\s* (?P<macro_inline_start>\\w+) \\s* (?P<macro_inline_args>.*?) \\s* >>\n        (?P<macro_inline_text>(.|\\n)*?)\n        <</ \\s* (?P=macro_inline_start) \\s* >>\n        )\n    |(?P<macro_tag>\n            <<(?P<macro_tag_name> \\w+) (?P<macro_tag_args>.*?) \\s* /*>>\n        )|(?i)(?P<image>\n            {{\n            (?P<image_target>.+?) \\s*\n            (\\| \\s* (?P<image_text>.+?) \\s*)?\n            }}\n        )|(?P<pre_inline> {{{ (?P<pre_inline_text>.*?) }}} ) | [^|])+ )\n            ) \\s*\n        '
    cell_re = re.compile(x, re.VERBOSE | re.UNICODE)


(Line number differs because there was a change in the source between these two test runs).

I would like to create a pr and remove the limitation to 20 characters completely, but wanted to get feedback before I do so.

The deprecation warning was created by Tim Graham - maybe he could elaborate why it was cut at 20 chars at first?
https://github.com/python/cpython/commit/abf275af5804c5f76fbe10c5cb1dd3d2e4b04c5b
msg360308 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-01-20 11:54
Why do you want to output the full regular expression? Is not source file path, line number, and starting 20 characters not enough to identify the affected regular expression?
msg360316 - (view) Author: Jürgen Gmach (jugmac00) * Date: 2020-01-20 13:03
> Why do you want to output the full regular expression?

The current output gives no clue about which flag is problematic, nor does it show the complete output (which at least would include the problematic flag), nor does it show the exact line, as it refers only to the line where compile gets called.

The warning points to following line ( https://github.com/jedie/python-creole/blob/4e74f29daaf5026a3d4d6dae9f2e74f5f3655439/creole/parser/creol2html_parser.py#L49-L50 ):

cell_re = re.compile(SpecialRules.cell, re.VERBOSE | re.UNICODE)


And SpecialRules.cell is a quite a big class ( https://github.com/jedie/python-creole/blob/4e74f29daaf5026a3d4d6dae9f2e74f5f3655439/creole/parser/creol2html_rules.py#L16-L97 ) defining lots of partial expressions.

Even if spotting this line ( https://github.com/jedie/python-creole/blob/4e74f29daaf5026a3d4d6dae9f2e74f5f3655439/creole/parser/creol2html_rules.py#L54 ) at the first glance it looks like it starts with the flag and should be correct (but is not as it turned out later).


> Is not source file path, line number, and starting 20 characters not enough to identify the affected regular expression?

It definitely was not enough for me (new to this code base as I only tried to report deprecation warnings in my application), and when you have a look at the comment ( https://github.com/jedie/python-creole/issues/31#issuecomment-575983117 ) it even was not enough for the author/maintainer of this package.

Do you expect any downside of printing the complete warning?
msg393802 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-05-17 09:53
I have to admit that I find the truncated version more readable.

Some sort of truncation is useful, as a regex could be thousands of character long.

Adding the offset to the warning message seems like a useful addition.
msg415541 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-03-19 11:45
This warning was introduced in 3.6. It is a time to convert it into an error. RE error messages contain position.

But I understand that very few users will use 3.11 in nearest future, so I am going to add a position to warning message and backport this change. It is not a bugfix in strong meaning, but I think it is safe to backport it.
msg415545 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-03-19 12:13
New changeset 4142961b9f5ad3bf93976a6a7162f8049e354018 by Serhiy Storchaka in branch 'main':
bpo-39394: Improve warning message in the re module (GH-31988)
https://github.com/python/cpython/commit/4142961b9f5ad3bf93976a6a7162f8049e354018
msg415550 - (view) Author: miss-islington (miss-islington) Date: 2022-03-19 14:09
New changeset 906f1a4a95e9ca82171a40a28b16533a14fa339c by Miss Islington (bot) in branch '3.10':
bpo-39394: Improve warning message in the re module (GH-31988)
https://github.com/python/cpython/commit/906f1a4a95e9ca82171a40a28b16533a14fa339c
msg415551 - (view) Author: miss-islington (miss-islington) Date: 2022-03-19 14:10
New changeset cbcd2e36d6cbb1d8b6a2b30a2cf1484b7857e7d6 by Miss Islington (bot) in branch '3.9':
bpo-39394: Improve warning message in the re module (GH-31988)
https://github.com/python/cpython/commit/cbcd2e36d6cbb1d8b6a2b30a2cf1484b7857e7d6
History
Date User Action Args
2022-04-11 14:59:25adminsetgithub: 83575
2022-03-19 14:11:50serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2022-03-19 14:10:02miss-islingtonsetmessages: + msg415551
2022-03-19 14:09:53miss-islingtonsetmessages: + msg415550
2022-03-19 12:13:57miss-islingtonsetpull_requests: + pull_request30081
2022-03-19 12:13:53miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request30080
2022-03-19 12:13:36serhiy.storchakasetmessages: + msg415545
2022-03-19 11:45:47serhiy.storchakasetmessages: + msg415541
versions: + Python 3.10, Python 3.11, - Python 3.7, Python 3.8
2022-03-19 11:40:36serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request30079
2021-05-17 09:53:33Mark.Shannonsetnosy: + Mark.Shannon
messages: + msg393802
2020-01-20 13:03:11jugmac00setmessages: + msg360316
2020-01-20 11:59:41vstinnersettitle: DeprecationWarning for `flag not at the start of expression` is cutoff too early -> re: DeprecationWarning for `flag not at the start of expression` is cutoff too early
2020-01-20 11:54:08serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg360308
2020-01-20 10:44:04jugmac00create