classification
Title: re.VERBOSE whitespace behavior not completely documented
Type: enhancement Stage: resolved
Components: Documentation, Regular Expressions Versions: Python 3.7, Python 3.6, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Kevin Shweh, docs@python, ezio.melotti, mrabarnett, roysmith, serhiy.storchaka, stevencollins, zach.ware
Priority: normal Keywords: patch

Created on 2012-08-09 17:50 by stevencollins, last changed 2017-11-14 15:39 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
re_whitespace.patch stevencollins, 2012-08-11 19:27 Proposed patch for re.VERBOSE docs (whitespace behavior) review
Pull Requests
URL Status Linked Edit
PR 4366 merged serhiy.storchaka, 2017-11-10 21:53
PR 4394 merged python-dev, 2017-11-14 15:22
PR 4395 merged python-dev, 2017-11-14 15:23
Messages (11)
msg167803 - (view) Author: Steven Collins (stevencollins) Date: 2012-08-09 17:50
Given the way the documentation is written for re.VERBOSE - "Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash" - I would expect all three of the findall() commands below to return successfully with the same result:

Python 3.2.3 (default, Jun  8 2012, 05:37:15) 
[GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.findall('(?x) (?: a | b ) + ', 'abaabc')
['abaab']
>>> re.findall('(?x) (? : a | b ) + ', 'abaabc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.2/re.py", line 193, in findall
    return _compile(pattern, flags).findall(string)
  File "/usr/lib/python3.2/re.py", line 255, in _compile
    return _compile_typed(type(pattern), pattern, flags)
  File "/usr/lib/python3.2/functools.py", line 184, in wrapper
    result = user_function(*args, **kwds)
  File "/usr/lib/python3.2/re.py", line 267, in _compile_typed
    return sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.2/sre_compile.py", line 491, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.2/sre_parse.py", line 692, in parse
    p = _parse_sub(source, pattern, 0)
  File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
    itemsappend(_parse(source, state))
  File "/usr/lib/python3.2/sre_parse.py", line 627, in _parse
    raise error("unexpected end of pattern")
sre_constants.error: unexpected end of pattern
>>> re.findall('(?x) ( ?: a | b ) + ', 'abaabc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.2/re.py", line 193, in findall
    return _compile(pattern, flags).findall(string)
  File "/usr/lib/python3.2/re.py", line 255, in _compile
    return _compile_typed(type(pattern), pattern, flags)
  File "/usr/lib/python3.2/functools.py", line 184, in wrapper
    result = user_function(*args, **kwds)
  File "/usr/lib/python3.2/re.py", line 267, in _compile_typed
    return sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.2/sre_compile.py", line 491, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.2/sre_parse.py", line 692, in parse
    p = _parse_sub(source, pattern, 0)
  File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
    itemsappend(_parse(source, state))
  File "/usr/lib/python3.2/sre_parse.py", line 640, in _parse
    p = _parse_sub(source, state)
  File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
    itemsappend(_parse(source, state))
  File "/usr/lib/python3.2/sre_parse.py", line 520, in _parse
    raise error("nothing to repeat")
sre_constants.error: nothing to repeat
>>> 

The behavior is the same in Python 2.7. Apparently the scan for the special '(?' character sequences happens before the whitespace is stripped out. In my opinion, the behavior should be changed, the documentation should be more clear about the current behavior, or at least the errors given should be more informative (I spent an hour or two debugging the "nothing to repeat" error in my work yesterday.) Thank you.
msg167890 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2012-08-10 16:37
Ideally, yes, that whitespace should be ignored.

The question is whether it's worth fixing the code for the small case of when there's whitespace within "tokens", such as within "(?:". Usually those who use verbose mode use whitespace as in the first example rather than the second or third examples.
msg167999 - (view) Author: Steven Collins (stevencollins) Date: 2012-08-11 19:27
Fair enough, but in that case I still think the current behavior should be documented. Attached is a possible patch. (This is my first interaction with the Python issue tracker, by the way; apologies if I ought to have set some field differently or left some other field alone.)
msg181928 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-11 19:53
See also related issue11204.
msg182174 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-02-15 21:08
See also #17184.
msg305158 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-28 11:49
Steven, would you mind to update your patch according to review comments and create a pull request on GitHub?
msg306039 - (view) Author: Kevin Shweh (Kevin Shweh) Date: 2017-11-10 17:19
It looks to me like there are more situations than the patch lists where whitespace still separates tokens. For example, *? is a reluctant quantifier and * ? is a syntax error, even in verbose mode.
msg306050 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-11-10 21:57
Steven's patch is outdated since 71a0b43854164b6ada0026d90f241c987b54d019. But that commit missed that spaces are not ignored within tokens. PR 4366 fixes this by using the wording from Ezio's comments.
msg306216 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-11-14 15:21
New changeset b0b44b4b3337297007f5ef87220a75df204399f8 by Serhiy Storchaka in branch 'master':
bpo-15606: Improve the re.VERBOSE documentation. (#4366)
https://github.com/python/cpython/commit/b0b44b4b3337297007f5ef87220a75df204399f8
msg306217 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-11-14 15:38
New changeset 14c1fe682f0086ec28f24fee9bf1c85d80507ee5 by Serhiy Storchaka (Miss Islington (bot)) in branch '3.6':
bpo-15606: Improve the re.VERBOSE documentation. (GH-4366) (#4394)
https://github.com/python/cpython/commit/14c1fe682f0086ec28f24fee9bf1c85d80507ee5
msg306218 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-11-14 15:39
New changeset a2f1be0b5ba2bed49b7f94c026b541ff07e52518 by Serhiy Storchaka (Miss Islington (bot)) in branch '2.7':
bpo-15606: Improve the re.VERBOSE documentation. (GH-4366) (#4395)
https://github.com/python/cpython/commit/a2f1be0b5ba2bed49b7f94c026b541ff07e52518
History
Date User Action Args
2017-11-14 15:39:50serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-11-14 15:39:06serhiy.storchakasetmessages: + msg306218
2017-11-14 15:38:52serhiy.storchakasetmessages: + msg306217
2017-11-14 15:23:36python-devsetpull_requests: + pull_request4343
2017-11-14 15:22:43python-devsetpull_requests: + pull_request4342
2017-11-14 15:21:28serhiy.storchakasetmessages: + msg306216
2017-11-10 21:57:34serhiy.storchakasetnosy: + zach.ware
messages: + msg306050
2017-11-10 21:53:19serhiy.storchakasetstage: needs patch -> patch review
pull_requests: + pull_request4319
2017-11-10 17:19:46Kevin Shwehsetnosy: + Kevin Shweh
messages: + msg306039
2017-10-28 11:49:45serhiy.storchakasetstage: patch review -> needs patch
messages: + msg305158
versions: + Python 2.7, Python 3.6, Python 3.7, - Python 3.3
2013-02-15 21:08:52ezio.melottisetmessages: + msg182174
2013-02-15 21:08:18ezio.melottilinkissue17184 superseder
2013-02-11 19:53:17serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg181928
2013-02-11 19:42:49roysmithsetnosy: + roysmith
2012-09-15 23:32:10ezio.melottisetstage: patch review
2012-08-11 19:27:39stevencollinssetfiles: + re_whitespace.patch


assignee: docs@python
keywords: + patch
versions: + Python 3.3, - Python 2.7, Python 3.2
nosy: + docs@python
title: re.VERBOSE doesn't ignore certain whitespace -> re.VERBOSE whitespace behavior not completely documented
messages: + msg167999
components: + Documentation
type: behavior -> enhancement
2012-08-10 16:37:05mrabarnettsetmessages: + msg167890
2012-08-09 17:50:17stevencollinscreate