This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author stevencollins
Recipients ezio.melotti, mrabarnett, stevencollins
Date 2012-08-09.17:50:15
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1344534617.67.0.49137100671.issue15606@psf.upfronthosting.co.za>
In-reply-to
Content
Given the way the documentation is written for re.VERBOSE - "Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash" - I would expect all three of the findall() commands below to return successfully with the same result:

Python 3.2.3 (default, Jun  8 2012, 05:37:15) 
[GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.findall('(?x) (?: a | b ) + ', 'abaabc')
['abaab']
>>> re.findall('(?x) (? : a | b ) + ', 'abaabc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.2/re.py", line 193, in findall
    return _compile(pattern, flags).findall(string)
  File "/usr/lib/python3.2/re.py", line 255, in _compile
    return _compile_typed(type(pattern), pattern, flags)
  File "/usr/lib/python3.2/functools.py", line 184, in wrapper
    result = user_function(*args, **kwds)
  File "/usr/lib/python3.2/re.py", line 267, in _compile_typed
    return sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.2/sre_compile.py", line 491, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.2/sre_parse.py", line 692, in parse
    p = _parse_sub(source, pattern, 0)
  File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
    itemsappend(_parse(source, state))
  File "/usr/lib/python3.2/sre_parse.py", line 627, in _parse
    raise error("unexpected end of pattern")
sre_constants.error: unexpected end of pattern
>>> re.findall('(?x) ( ?: a | b ) + ', 'abaabc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.2/re.py", line 193, in findall
    return _compile(pattern, flags).findall(string)
  File "/usr/lib/python3.2/re.py", line 255, in _compile
    return _compile_typed(type(pattern), pattern, flags)
  File "/usr/lib/python3.2/functools.py", line 184, in wrapper
    result = user_function(*args, **kwds)
  File "/usr/lib/python3.2/re.py", line 267, in _compile_typed
    return sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.2/sre_compile.py", line 491, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.2/sre_parse.py", line 692, in parse
    p = _parse_sub(source, pattern, 0)
  File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
    itemsappend(_parse(source, state))
  File "/usr/lib/python3.2/sre_parse.py", line 640, in _parse
    p = _parse_sub(source, state)
  File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
    itemsappend(_parse(source, state))
  File "/usr/lib/python3.2/sre_parse.py", line 520, in _parse
    raise error("nothing to repeat")
sre_constants.error: nothing to repeat
>>> 

The behavior is the same in Python 2.7. Apparently the scan for the special '(?' character sequences happens before the whitespace is stripped out. In my opinion, the behavior should be changed, the documentation should be more clear about the current behavior, or at least the errors given should be more informative (I spent an hour or two debugging the "nothing to repeat" error in my work yesterday.) Thank you.
History
Date User Action Args
2012-08-09 17:50:17stevencollinssetrecipients: + stevencollins, ezio.melotti, mrabarnett
2012-08-09 17:50:17stevencollinssetmessageid: <1344534617.67.0.49137100671.issue15606@psf.upfronthosting.co.za>
2012-08-09 17:50:17stevencollinslinkissue15606 messages
2012-08-09 17:50:15stevencollinscreate