New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re.compile(r'((x|y+)*)*') should not fail #46789
Comments
Below, the second regexp seems just as guilty as the first to me. Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.compile(r'((x|y)*)*')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py",
line 180, in compile
return _compile(pattern, flags)
File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py",
line 233, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
>>> re.compile(r'((x|y+)*)*')
<_sre.SRE_Pattern object at 0x18548> I don't know if that error is to protect the sre engine from bad |
I'm almost tempted to call the first of these a bug: isn't '((x|y))' Even if there are issues with capturing, shouldn't the version without I get: >>> re.compile(r'(?:(?:x|y)*)*')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/re.py", line 180, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.5/re.py", line 233, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat |
Huh. Maybe you're right. JavaScript, Ruby, and Perl all accept both js> "xyyzy".replace(/((x|y))/, "($1, $2)")
DB<1> $_ = 'xyyzy'; s/((x|y))/(\1 \2)/; print Ruby's behavior seems best to me. |
We can obtain the Ruby behavior easily. There is one check in sre_compile.py in the '_simple' function that needs to be removed (see attached patch). Whether or not the Ruby behavior is the "correct" behavior I am still not sure. In any case, I think throwing an exception is to aggressive for this case. |
The re module is addressed in issue bpo-2636. BTW, my regex module behaves like Ruby: >>> regex.sub(r"((x|y)*)*", "(\\1, \\2)", "xyyzy", count=1)
'(, y)zy'
>>> regex.sub(r"((x|y+)*)*", "(\\1, \\2)", "xyyzy", count=1)
'(, yy)zy' |
Wow, that issue thread is massive... What about the 're' module is addressed? Is 'regex' replacing 're'? Is 'regex' being rolled into 're'? Are they both going to exist? |
The issue started about updating the re module and adding features that other languages already possess in their regex implementations (the last time any significant work was done on it was in 2003). The hope is that the new regex implementation will eventually replace the existing one, and putting it initially in a module called 'regex' allows it to be tested more easily. You can do: import regex as re and existing code should still work. |
New changeset 7ab07f15d78c by Serhiy Storchaka in branch '3.3': New changeset f4271cc2dfb5 by Serhiy Storchaka in branch 'default': New changeset 7b867a46a8b4 by Serhiy Storchaka in branch '2.7': |
This issue is a duplicate of bpo-1633953. See also bpo-18647. After some fixes in other parts of the re module this check has become even more invalid. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: