Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re.compile("(.*$){1,4}", re.MULTILINE) fails #44451

Closed
doko42 opened this issue Jan 12, 2007 · 4 comments
Closed

re.compile("(.*$){1,4}", re.MULTILINE) fails #44451

doko42 opened this issue Jan 12, 2007 · 4 comments

Comments

@doko42
Copy link
Member

doko42 commented Jan 12, 2007

BPO 1633953
Nosy @doko42, @serhiy-storchaka
Superseder
  • bpo-2537: re.compile(r'((x
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-08-19.20:33:07.099>
    created_at = <Date 2007-01-12.10:45:14.000>
    labels = ['expert-regex']
    title = 're.compile("(.*$){1,4}", re.MULTILINE) fails'
    updated_at = <Date 2013-08-19.20:33:07.097>
    user = 'https://github.com/doko42'

    bugs.python.org fields:

    activity = <Date 2013-08-19.20:33:07.097>
    actor = 'serhiy.storchaka'
    assignee = 'niemeyer'
    closed = True
    closed_date = <Date 2013-08-19.20:33:07.099>
    closer = 'serhiy.storchaka'
    components = ['Regular Expressions']
    creation = <Date 2007-01-12.10:45:14.000>
    creator = 'doko'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 1633953
    keywords = []
    message_count = 4.0
    messages = ['61053', '74660', '116581', '195663']
    nosy_count = 6.0
    nosy_names = ['doko', 'niemeyer', 'timehorse', 'schmir', 'BreamoreBoy', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '2537'
    type = None
    url = 'https://bugs.python.org/issue1633953'
    versions = ['Python 2.7']

    @doko42
    Copy link
    Member Author

    doko42 commented Jan 12, 2007

    [forwarded from http://bugs.debian.org/289603]

    Trying to match 1-4 lines of arbitrary content (as part of a larger regex) using the expression (.*$){1,4} and re.MULTILINE. This caused the re module to raise the error "nothing to repeat".

    $ python2.5
    Python 2.5 (release25-maint, Dec 13 2006, 16:21:45) 
    [GCC 4.1.2 20061212 (prerelease) (Ubuntu 4.1.1-21ubuntu2)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import re
    >>> re.compile("(.*$){1,4}", re.MULTILINE)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.5/re.py", line 180, in compile
        return _compile(pattern, flags)
      File "/usr/lib/python2.5/re.py", line 233, in _compile
        raise error, v # invalid expression
    sre_constants.error: nothing to repeat

    @timehorse
    Copy link
    Mannequin

    timehorse mannequin commented Oct 11, 2008

    On first blush, this issue sounds quite similar to bpo-2537, but I
    have been looking at different scenarios and found that there is a
    subtle difference because, grammatically:

    (?m)(?:.*$)(.*$)

    is the same as:

    (?m)(.*$){2}

    Yet the former compiles while the later raises the exception you list
    below. Thus, I think the issue YOU raise is indeed related to the
    redundant repeat operator issue numbered 2537, BUT, when I match an
    expression with the alternate form, I get an empty string in my capture
    group, since in a range repeat over a capture group, only the last group
    is captured, while the entire expression matches only the first line,
    without the end-line character. Thus, the other thing to remember is
    that ^ and $ are zero-width matches, so when you write .*$, you are
    saying match up to, but not including, the end of the line. If you
    immediately follow that with another .*$, you will start from the point
    "up to, but not including, the end of the line", which means the next
    character is an end of line. Thus, when you reach the second .*$, you
    capture nothing because the .* is allowed to be zero-length and you
    still haven't advanced PAST the end of the line.

    As a working alternative, you could write r'(?m)(?:(.*$)[\r\n]*){1,4}' ,
    since this would give you your 1-4 lines, but also consume the carriage
    return and line feed characters to get you to the next line.

    Since we don't want to change the meaning of $ and ^ to make them
    capturing (custom POSIX character classes may make 'capturing' a new
    line character easier), and the 'redundant repeat operator' is already
    listed as a bug (your expression is essentially saying (.*){1,4}$
    because it does not capture the new-line character(s) and thus has a
    redundant repeat operation in the range repeat expression), I'm willing
    to call this a repeat (technically repeated by as this issue is older)
    of bpo-2537.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Sep 16, 2010

    Can this be closed as a duplicate of bpo-2537?

    @serhiy-storchaka
    Copy link
    Member

    Fixed in bpo-2537. See also bpo-18647.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants