classification
Title: re.compile(r'((x|y+)*)*') should not fail
Type: behavior Stage: resolved
Components: Library (Lib), Regular Expressions Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, jorendorff, meador.inge, mrabarnett, python-dev, rsc, serhiy.storchaka, timehorse
Priority: normal Keywords: patch

Created on 2008-04-02 17:36 by jorendorff, last changed 2013-08-19 20:43 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
issue-2537.patch meador.inge, 2010-02-11 03:17 patch against 2.7 trunk review
Messages (9)
msg64865 - (view) Author: Jason Orendorff (jorendorff) Date: 2008-04-02 17:36
Below, the second regexp seems just as guilty as the first to me.

Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) 
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.compile(r'((x|y)*)*')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py",
line 180, in compile
    return _compile(pattern, flags)
  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py",
line 233, in _compile
    raise error, v # invalid expression
sre_constants.error: nothing to repeat
>>> re.compile(r'((x|y+)*)*')
<_sre.SRE_Pattern object at 0x18548>

I don't know if that error is to protect the sre engine from bad
patterns or just a courtesy to users.  If the former, it could be a
serious bug.
msg64934 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-04-04 17:41
I'm almost tempted to call the first of these a bug:  isn't '((x|y)*)*'
a perfectly valid (albeit somewhat redundant) regular expression?  What
am I missing here?

Even if there are issues with capturing, shouldn't the version without
capturing subexpressions still work?

I get:

>>> re.compile(r'(?:(?:x|y)*)*')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/re.py", line 180, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python2.5/re.py", line 233, in _compile
    raise error, v # invalid expression
sre_constants.error: nothing to repeat
msg64950 - (view) Author: Jason Orendorff (jorendorff) Date: 2008-04-04 20:37
Huh.  Maybe you're right.  JavaScript, Ruby, and Perl all accept both
regexes, although no two agree on what should be captured:

js> "xyyzy".replace(/((x|y)*)*/, "($1, $2)") 
(xyy, y)zy
js> "xyyzy".replace(/((x|y+)*)*/, "($1, $2)")
(xyy, yy)zy

>> "xyyzy".sub(/((x|y)*)*/, "(\\1, \\2)")
=> "(, y)zy"
>> "xyyzy".sub(/((x|y+)*)*/, "(\\1, \\2)")
=> "(, yy)zy"

  DB<1> $_ = 'xyyzy'; s/((x|y)*)*/(\1 \2)/; print
( )zy
  DB<2> $_ = 'xyyzy'; s/((x|y+)*)*/(\1 \2)/; print
( yy)zy

Ruby's behavior seems best to me.
msg99194 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2010-02-11 03:17
> Ruby's behavior seems best to me.

We can obtain the Ruby behavior easily.  There is one check in sre_compile.py in the '_simple' function that needs to be removed (see attached patch).  Whether or not the Ruby behavior is the "correct" behavior I am still not sure.  In any case, I think throwing an exception is to aggressive for this case.
msg99237 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2010-02-11 20:49
The re module is addressed in issue #2636.

BTW, my regex module behaves like Ruby:

>>> regex.sub(r"((x|y)*)*", "(\\1, \\2)", "xyyzy", count=1)
'(, y)zy'
>>> regex.sub(r"((x|y+)*)*", "(\\1, \\2)", "xyyzy", count=1)
'(, yy)zy'
msg99248 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2010-02-12 02:12
> The re module is addressed in issue #2636.

Wow, that issue thread is massive...  What about the 're' module is addressed?  Is 'regex' replacing 're'?  Is 'regex' being rolled into 're'?  Are they both going to exist?
msg99251 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2010-02-12 02:27
The issue started about updating the re module and adding features that other languages already possess in their regex implementations (the last time any significant work was done on it was in 2003).

The hope is that the new regex implementation will eventually replace the existing one, and putting it initially in a module called 'regex' allows it to be tested more easily.

You can do:

import regex as re

and existing code should still work.
msg195662 - (view) Author: Roundup Robot (python-dev) Date: 2013-08-19 20:30
New changeset 7ab07f15d78c by Serhiy Storchaka in branch '3.3':
Issue #2537: Remove breaked check which prevented valid regular expressions.
http://hg.python.org/cpython/rev/7ab07f15d78c

New changeset f4271cc2dfb5 by Serhiy Storchaka in branch 'default':
Issue #2537: Remove breaked check which prevented valid regular expressions.
http://hg.python.org/cpython/rev/f4271cc2dfb5

New changeset 7b867a46a8b4 by Serhiy Storchaka in branch '2.7':
Issue #2537: Remove breaked check which prevented valid regular expressions.
http://hg.python.org/cpython/rev/7b867a46a8b4
msg195664 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-19 20:43
This issue is a duplicate of issue1633953. See also issue18647. After some fixes in other parts of the re module this check has become even more invalid.
History
Date User Action Args
2013-08-19 20:43:29serhiy.storchakasetstatus: open -> closed
versions: + Python 3.3
title: re.compile(r'((x|y+)*)*') should fail -> re.compile(r'((x|y+)*)*') should not fail
messages: + msg195664

resolution: fixed
stage: resolved
2013-08-19 20:33:07serhiy.storchakalinkissue1633953 superseder
2013-08-19 20:30:24python-devsetnosy: + python-dev
messages: + msg195662
2012-11-27 22:47:00ezio.melottisetnosy: + serhiy.storchaka
2012-11-25 15:47:41ezio.melottisetcomponents: + Regular Expressions
2012-11-25 15:12:53mark.dickinsonsetnosy: - mark.dickinson
2012-11-25 14:23:22Ramchandra Aptesetcomponents: + Library (Lib), - Regular Expressions
versions: + Python 3.4
2010-02-12 02:27:16mrabarnettsetmessages: + msg99251
2010-02-12 02:12:23meador.ingesetnosy: mark.dickinson, rsc, timehorse, jorendorff, ezio.melotti, mrabarnett, meador.inge
type: behavior
messages: + msg99248
components: + Regular Expressions
2010-02-11 20:49:44mrabarnettsetnosy: + mrabarnett
messages: + msg99237
2010-02-11 03:17:57meador.ingesetfiles: + issue-2537.patch

nosy: + meador.inge
messages: + msg99194

keywords: + patch
2009-05-12 14:25:39ezio.melottisetnosy: + ezio.melotti
2008-09-28 19:23:32timehorsesetnosy: + timehorse
versions: + Python 2.7, - Python 2.6
2008-04-24 21:02:18rscsetnosy: + rsc
2008-04-04 20:37:57jorendorffsetmessages: + msg64950
2008-04-04 17:41:12mark.dickinsonsetnosy: + mark.dickinson
messages: + msg64934
2008-04-02 17:36:05jorendorffcreate