New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re.VERBOSE whitespace behavior not completely documented #59811
Comments
Given the way the documentation is written for re.VERBOSE - "Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash" - I would expect all three of the findall() commands below to return successfully with the same result: Python 3.2.3 (default, Jun 8 2012, 05:37:15)
[GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.findall('(?x) (?: a | b ) + ', 'abaabc')
['abaab']
>>> re.findall('(?x) (? : a | b ) + ', 'abaabc')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.2/re.py", line 193, in findall
return _compile(pattern, flags).findall(string)
File "/usr/lib/python3.2/re.py", line 255, in _compile
return _compile_typed(type(pattern), pattern, flags)
File "/usr/lib/python3.2/functools.py", line 184, in wrapper
result = user_function(*args, **kwds)
File "/usr/lib/python3.2/re.py", line 267, in _compile_typed
return sre_compile.compile(pattern, flags)
File "/usr/lib/python3.2/sre_compile.py", line 491, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.2/sre_parse.py", line 692, in parse
p = _parse_sub(source, pattern, 0)
File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
itemsappend(_parse(source, state))
File "/usr/lib/python3.2/sre_parse.py", line 627, in _parse
raise error("unexpected end of pattern")
sre_constants.error: unexpected end of pattern
>>> re.findall('(?x) ( ?: a | b ) + ', 'abaabc')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.2/re.py", line 193, in findall
return _compile(pattern, flags).findall(string)
File "/usr/lib/python3.2/re.py", line 255, in _compile
return _compile_typed(type(pattern), pattern, flags)
File "/usr/lib/python3.2/functools.py", line 184, in wrapper
result = user_function(*args, **kwds)
File "/usr/lib/python3.2/re.py", line 267, in _compile_typed
return sre_compile.compile(pattern, flags)
File "/usr/lib/python3.2/sre_compile.py", line 491, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.2/sre_parse.py", line 692, in parse
p = _parse_sub(source, pattern, 0)
File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
itemsappend(_parse(source, state))
File "/usr/lib/python3.2/sre_parse.py", line 640, in _parse
p = _parse_sub(source, state)
File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
itemsappend(_parse(source, state))
File "/usr/lib/python3.2/sre_parse.py", line 520, in _parse
raise error("nothing to repeat")
sre_constants.error: nothing to repeat
>>> The behavior is the same in Python 2.7. Apparently the scan for the special '(?' character sequences happens before the whitespace is stripped out. In my opinion, the behavior should be changed, the documentation should be more clear about the current behavior, or at least the errors given should be more informative (I spent an hour or two debugging the "nothing to repeat" error in my work yesterday.) Thank you. |
Ideally, yes, that whitespace should be ignored. The question is whether it's worth fixing the code for the small case of when there's whitespace within "tokens", such as within "(?:". Usually those who use verbose mode use whitespace as in the first example rather than the second or third examples. |
Fair enough, but in that case I still think the current behavior should be documented. Attached is a possible patch. (This is my first interaction with the Python issue tracker, by the way; apologies if I ought to have set some field differently or left some other field alone.) |
See also related bpo-11204. |
See also bpo-17184. |
Steven, would you mind to update your patch according to review comments and create a pull request on GitHub? |
It looks to me like there are more situations than the patch lists where whitespace still separates tokens. For example, *? is a reluctant quantifier and * ? is a syntax error, even in verbose mode. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: