This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Unexpected behavior of re module when VERBOSE flag is set
Type: behavior Stage: resolved
Components: Library (Lib), Regular Expressions Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: bkline, ezio.melotti, mrabarnett
Priority: normal Keywords:

Created on 2017-10-23 23:25 by bkline, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
regex-repro.py bkline, 2017-10-23 23:25 Repro case for issue
Messages (4)
msg304849 - (view) Author: Bob Kline (bkline) * Date: 2017-10-23 23:25
According to the documentation of the re module, "When this flag [re.VERBOSE] has been specified, whitespace within the RE string is ignored, except when the whitespace is in a character class or preceded by an unescaped backslash; this lets you organize and indent the RE more clearly. This flag also lets you put comments within a RE that will be ignored by the engine; comments are marked by a '#' that’s neither in a character class [n]or preceded by an unescaped backslash." (I'm quoting from the 3.6.3 documentation, but I've tested with several versions of Python, as indicated in the issue's `Versions` field, all with the same results.)

Given this description, I would have expected the output for each of the pairs of calls to findall() in the attached repro code to be the same, but that is not what's happening. In the case of the first pair of calls, for example, the non-verbose version finds two more matches than the verbose version, even though the regular expression is identical for the two calls, ignoring whitespace and comments in the expression string. Similar problems appear with the other two pairs of calls.

Here's the output from the attached code:

['&', '(', '/Term/SemanticType/@cdr:ref', '==']
['/Term/SemanticType/@cdr:ref', '==']
[' XXX ']
[]
[' XXX ']
[]

It would seem that at least one of the following is true:

 1. the module is not behaving as it should
 2. the documentation is wrong
 3. I have not understood the documentation correctly

I'm happy for it to be #3, as long as someone can explain what I have not understood.
msg304852 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2017-10-23 23:55
Your verbose examples put the pattern into raw triple-quoted strings, which is OK, but their first character is a backslash, which makes the next character (a newline) an escaped literal whitespace character. Escaped whitespace is significant in a verbose pattern.
msg304853 - (view) Author: Bob Kline (bkline) * Date: 2017-10-24 00:52
I had been under the impression that "escaped" in this context meant that an escape character (the backslash) was part of the string value for the regular expression (there's a little bit of overloading going on with that word). Thanks for setting me straight.
msg304856 - (view) Author: Bob Kline (bkline) * Date: 2017-10-24 03:36
The light finally comes on. I actually *was* putting a backslash into the string value, with the raw flag (which is, of course, what you were trying to tell me). Thanks for your patience. :-)
History
Date User Action Args
2022-04-11 14:58:53adminsetgithub: 76037
2017-10-24 03:36:50bklinesetmessages: + msg304856
2017-10-24 00:53:00bklinesetmessages: + msg304853
2017-10-23 23:55:55mrabarnettsetstatus: open -> closed
resolution: not a bug
messages: + msg304852

stage: resolved
2017-10-23 23:25:05bklinecreate