Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re.compile(r'((x|y+)*)*') should not fail #46789

Closed
jorendorff mannequin opened this issue Apr 2, 2008 · 9 comments
Closed

re.compile(r'((x|y+)*)*') should not fail #46789

jorendorff mannequin opened this issue Apr 2, 2008 · 9 comments
Labels
stdlib Python modules in the Lib dir topic-regex type-bug An unexpected behavior, bug, or error

Comments

@jorendorff
Copy link
Mannequin

jorendorff mannequin commented Apr 2, 2008

BPO 2537
Nosy @ezio-melotti, @meadori, @serhiy-storchaka
Files
  • issue-2537.patch: patch against 2.7 trunk
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-08-19.20:43:29.768>
    created_at = <Date 2008-04-02.17:36:05.061>
    labels = ['expert-regex', 'type-bug', 'library']
    title = "re.compile(r'((x|y+)*)*') should not fail"
    updated_at = <Date 2013-08-19.20:43:29.766>
    user = 'https://bugs.python.org/jorendorff'

    bugs.python.org fields:

    activity = <Date 2013-08-19.20:43:29.766>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-08-19.20:43:29.768>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)', 'Regular Expressions']
    creation = <Date 2008-04-02.17:36:05.061>
    creator = 'jorendorff'
    dependencies = []
    files = ['16207']
    hgrepos = []
    issue_num = 2537
    keywords = ['patch']
    message_count = 9.0
    messages = ['64865', '64934', '64950', '99194', '99237', '99248', '99251', '195662', '195664']
    nosy_count = 8.0
    nosy_names = ['rsc', 'timehorse', 'jorendorff', 'ezio.melotti', 'mrabarnett', 'meador.inge', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue2537'
    versions = ['Python 2.7', 'Python 3.3', 'Python 3.4']

    @jorendorff
    Copy link
    Mannequin Author

    jorendorff mannequin commented Apr 2, 2008

    Below, the second regexp seems just as guilty as the first to me.

    Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) 
    [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import re
    >>> re.compile(r'((x|y)*)*')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File
    "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py",
    line 180, in compile
        return _compile(pattern, flags)
      File
    "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py",
    line 233, in _compile
        raise error, v # invalid expression
    sre_constants.error: nothing to repeat
    >>> re.compile(r'((x|y+)*)*')
    <_sre.SRE_Pattern object at 0x18548>

    I don't know if that error is to protect the sre engine from bad
    patterns or just a courtesy to users. If the former, it could be a
    serious bug.

    @mdickinson
    Copy link
    Member

    I'm almost tempted to call the first of these a bug: isn't '((x|y))'
    a perfectly valid (albeit somewhat redundant) regular expression? What
    am I missing here?

    Even if there are issues with capturing, shouldn't the version without
    capturing subexpressions still work?

    I get:

    >>> re.compile(r'(?:(?:x|y)*)*')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.5/re.py", line 180, in compile
        return _compile(pattern, flags)
      File "/usr/lib/python2.5/re.py", line 233, in _compile
        raise error, v # invalid expression
    sre_constants.error: nothing to repeat

    @jorendorff
    Copy link
    Mannequin Author

    jorendorff mannequin commented Apr 4, 2008

    Huh. Maybe you're right. JavaScript, Ruby, and Perl all accept both
    regexes, although no two agree on what should be captured:

    js> "xyyzy".replace(/((x|y))/, "($1, $2)")
    (xyy, y)zy
    js> "xyyzy".replace(/((x|y+))/, "($1, $2)")
    (xyy, yy)zy

    > "xyyzy".sub(/((x|y))/, "(\\1, \\2)")
    => "(, y)zy"
    > "xyyzy".sub(/((x|y+))/, "(\\1, \\2)")
    => "(, yy)zy"

    DB<1> $_ = 'xyyzy'; s/((x|y))/(\1 \2)/; print
    ( )zy
    DB<2> $_ = 'xyyzy'; s/((x|y+))/(\1 \2)/; print
    ( yy)zy

    Ruby's behavior seems best to me.

    @meadori
    Copy link
    Member

    meadori commented Feb 11, 2010

    Ruby's behavior seems best to me.

    We can obtain the Ruby behavior easily. There is one check in sre_compile.py in the '_simple' function that needs to be removed (see attached patch). Whether or not the Ruby behavior is the "correct" behavior I am still not sure. In any case, I think throwing an exception is to aggressive for this case.

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Feb 11, 2010

    The re module is addressed in issue bpo-2636.

    BTW, my regex module behaves like Ruby:

    >>> regex.sub(r"((x|y)*)*", "(\\1, \\2)", "xyyzy", count=1)
    '(, y)zy'
    >>> regex.sub(r"((x|y+)*)*", "(\\1, \\2)", "xyyzy", count=1)
    '(, yy)zy'

    @meadori
    Copy link
    Member

    meadori commented Feb 12, 2010

    The re module is addressed in issue bpo-2636.

    Wow, that issue thread is massive... What about the 're' module is addressed? Is 'regex' replacing 're'? Is 'regex' being rolled into 're'? Are they both going to exist?

    @meadori meadori added topic-regex type-bug An unexpected behavior, bug, or error labels Feb 12, 2010
    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Feb 12, 2010

    The issue started about updating the re module and adding features that other languages already possess in their regex implementations (the last time any significant work was done on it was in 2003).

    The hope is that the new regex implementation will eventually replace the existing one, and putting it initially in a module called 'regex' allows it to be tested more easily.

    You can do:

    import regex as re

    and existing code should still work.

    @RamchandraApte RamchandraApte mannequin added stdlib Python modules in the Lib dir and removed topic-regex labels Nov 25, 2012
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 19, 2013

    New changeset 7ab07f15d78c by Serhiy Storchaka in branch '3.3':
    Issue bpo-2537: Remove breaked check which prevented valid regular expressions.
    http://hg.python.org/cpython/rev/7ab07f15d78c

    New changeset f4271cc2dfb5 by Serhiy Storchaka in branch 'default':
    Issue bpo-2537: Remove breaked check which prevented valid regular expressions.
    http://hg.python.org/cpython/rev/f4271cc2dfb5

    New changeset 7b867a46a8b4 by Serhiy Storchaka in branch '2.7':
    Issue bpo-2537: Remove breaked check which prevented valid regular expressions.
    http://hg.python.org/cpython/rev/7b867a46a8b4

    @serhiy-storchaka
    Copy link
    Member

    This issue is a duplicate of bpo-1633953. See also bpo-18647. After some fixes in other parts of the re module this check has become even more invalid.

    @serhiy-storchaka serhiy-storchaka changed the title re.compile(r'((x|y+)*)*') should fail re.compile(r'((x|y+)*)*') should not fail Aug 19, 2013
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir topic-regex type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants