Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re.VERBOSE whitespace behavior not completely documented #59811

Closed
stevencollins mannequin opened this issue Aug 9, 2012 · 11 comments
Closed

re.VERBOSE whitespace behavior not completely documented #59811

stevencollins mannequin opened this issue Aug 9, 2012 · 11 comments
Labels
3.7 (EOL) end of life docs Documentation in the Doc dir topic-regex type-feature A feature request or enhancement

Comments

@stevencollins
Copy link
Mannequin

stevencollins mannequin commented Aug 9, 2012

BPO 15606
Nosy @ezio-melotti, @zware, @serhiy-storchaka
PRs
  • bpo-15606: Improve re.VERBOSE documentation. #4366
  • [3.6] bpo-15606: Improve the re.VERBOSE documentation. (GH-4366) #4394
  • [2.7] bpo-15606: Improve the re.VERBOSE documentation. (GH-4366) #4395
  • Files
  • re_whitespace.patch: Proposed patch for re.VERBOSE docs (whitespace behavior)
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-11-14.15:39:50.512>
    created_at = <Date 2012-08-09.17:50:17.086>
    labels = ['expert-regex', 'type-feature', '3.7', 'docs']
    title = 're.VERBOSE whitespace behavior not completely documented'
    updated_at = <Date 2017-11-14.15:39:50.511>
    user = 'https://bugs.python.org/stevencollins'

    bugs.python.org fields:

    activity = <Date 2017-11-14.15:39:50.511>
    actor = 'serhiy.storchaka'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2017-11-14.15:39:50.512>
    closer = 'serhiy.storchaka'
    components = ['Documentation', 'Regular Expressions']
    creation = <Date 2012-08-09.17:50:17.086>
    creator = 'stevencollins'
    dependencies = []
    files = ['26767']
    hgrepos = []
    issue_num = 15606
    keywords = ['patch']
    message_count = 11.0
    messages = ['167803', '167890', '167999', '181928', '182174', '305158', '306039', '306050', '306216', '306217', '306218']
    nosy_count = 8.0
    nosy_names = ['roysmith', 'ezio.melotti', 'mrabarnett', 'docs@python', 'zach.ware', 'serhiy.storchaka', 'stevencollins', 'Kevin Shweh']
    pr_nums = ['4366', '4394', '4395']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue15606'
    versions = ['Python 2.7', 'Python 3.6', 'Python 3.7']

    @stevencollins
    Copy link
    Mannequin Author

    stevencollins mannequin commented Aug 9, 2012

    Given the way the documentation is written for re.VERBOSE - "Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash" - I would expect all three of the findall() commands below to return successfully with the same result:

    Python 3.2.3 (default, Jun  8 2012, 05:37:15) 
    [GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import re
    >>> re.findall('(?x) (?: a | b ) + ', 'abaabc')
    ['abaab']
    >>> re.findall('(?x) (? : a | b ) + ', 'abaabc')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python3.2/re.py", line 193, in findall
        return _compile(pattern, flags).findall(string)
      File "/usr/lib/python3.2/re.py", line 255, in _compile
        return _compile_typed(type(pattern), pattern, flags)
      File "/usr/lib/python3.2/functools.py", line 184, in wrapper
        result = user_function(*args, **kwds)
      File "/usr/lib/python3.2/re.py", line 267, in _compile_typed
        return sre_compile.compile(pattern, flags)
      File "/usr/lib/python3.2/sre_compile.py", line 491, in compile
        p = sre_parse.parse(p, flags)
      File "/usr/lib/python3.2/sre_parse.py", line 692, in parse
        p = _parse_sub(source, pattern, 0)
      File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
        itemsappend(_parse(source, state))
      File "/usr/lib/python3.2/sre_parse.py", line 627, in _parse
        raise error("unexpected end of pattern")
    sre_constants.error: unexpected end of pattern
    >>> re.findall('(?x) ( ?: a | b ) + ', 'abaabc')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python3.2/re.py", line 193, in findall
        return _compile(pattern, flags).findall(string)
      File "/usr/lib/python3.2/re.py", line 255, in _compile
        return _compile_typed(type(pattern), pattern, flags)
      File "/usr/lib/python3.2/functools.py", line 184, in wrapper
        result = user_function(*args, **kwds)
      File "/usr/lib/python3.2/re.py", line 267, in _compile_typed
        return sre_compile.compile(pattern, flags)
      File "/usr/lib/python3.2/sre_compile.py", line 491, in compile
        p = sre_parse.parse(p, flags)
      File "/usr/lib/python3.2/sre_parse.py", line 692, in parse
        p = _parse_sub(source, pattern, 0)
      File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
        itemsappend(_parse(source, state))
      File "/usr/lib/python3.2/sre_parse.py", line 640, in _parse
        p = _parse_sub(source, state)
      File "/usr/lib/python3.2/sre_parse.py", line 315, in _parse_sub
        itemsappend(_parse(source, state))
      File "/usr/lib/python3.2/sre_parse.py", line 520, in _parse
        raise error("nothing to repeat")
    sre_constants.error: nothing to repeat
    >>> 

    The behavior is the same in Python 2.7. Apparently the scan for the special '(?' character sequences happens before the whitespace is stripped out. In my opinion, the behavior should be changed, the documentation should be more clear about the current behavior, or at least the errors given should be more informative (I spent an hour or two debugging the "nothing to repeat" error in my work yesterday.) Thank you.

    @stevencollins stevencollins mannequin added type-bug An unexpected behavior, bug, or error topic-regex labels Aug 9, 2012
    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Aug 10, 2012

    Ideally, yes, that whitespace should be ignored.

    The question is whether it's worth fixing the code for the small case of when there's whitespace within "tokens", such as within "(?:". Usually those who use verbose mode use whitespace as in the first example rather than the second or third examples.

    @stevencollins
    Copy link
    Mannequin Author

    stevencollins mannequin commented Aug 11, 2012

    Fair enough, but in that case I still think the current behavior should be documented. Attached is a possible patch. (This is my first interaction with the Python issue tracker, by the way; apologies if I ought to have set some field differently or left some other field alone.)

    @stevencollins stevencollins mannequin added the docs Documentation in the Doc dir label Aug 11, 2012
    @stevencollins stevencollins mannequin changed the title re.VERBOSE doesn't ignore certain whitespace re.VERBOSE whitespace behavior not completely documented Aug 11, 2012
    @stevencollins stevencollins mannequin added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Aug 11, 2012
    @serhiy-storchaka
    Copy link
    Member

    See also related bpo-11204.

    @ezio-melotti
    Copy link
    Member

    See also bpo-17184.

    @serhiy-storchaka
    Copy link
    Member

    Steven, would you mind to update your patch according to review comments and create a pull request on GitHub?

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Oct 28, 2017
    @KevinShweh
    Copy link
    Mannequin

    KevinShweh mannequin commented Nov 10, 2017

    It looks to me like there are more situations than the patch lists where whitespace still separates tokens. For example, *? is a reluctant quantifier and * ? is a syntax error, even in verbose mode.

    @serhiy-storchaka
    Copy link
    Member

    Steven's patch is outdated since 71a0b43. But that commit missed that spaces are not ignored within tokens. PR 4366 fixes this by using the wording from Ezio's comments.

    @serhiy-storchaka
    Copy link
    Member

    New changeset b0b44b4 by Serhiy Storchaka in branch 'master':
    bpo-15606: Improve the re.VERBOSE documentation. (bpo-4366)
    b0b44b4

    @serhiy-storchaka
    Copy link
    Member

    New changeset 14c1fe6 by Serhiy Storchaka (Miss Islington (bot)) in branch '3.6':
    bpo-15606: Improve the re.VERBOSE documentation. (GH-4366) (bpo-4394)
    14c1fe6

    @serhiy-storchaka
    Copy link
    Member

    New changeset a2f1be0 by Serhiy Storchaka (Miss Islington (bot)) in branch '2.7':
    bpo-15606: Improve the re.VERBOSE documentation. (GH-4366) (bpo-4395)
    a2f1be0

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life docs Documentation in the Doc dir topic-regex type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants