-
-
Notifications
You must be signed in to change notification settings - Fork 29.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re module: strange behaviour of space inside {m, n} #55413
Comments
A pattern like r"b{1,3}\Z" matches "b", "bb", and "bbb", as expected. There is no documentation of the behaviour of r"b{1, 3}\Z" -- it matches the LITERAL TEXT "b{1, 3}" in normal mode and "b{1,3}" in verbose mode. # paste the following at the interactive prompt:
pat = r"b{1, 3}\Z"
bool(re.match(pat, "bb")) # False
bool(re.match(pat, "b{1, 3}")) # True
bool(re.match(pat, "bb", re.VERBOSE)) # False
bool(re.match(pat, "b{1, 3}", re.VERBOSE)) # False
bool(re.match(pat, "b{1,3}", re.VERBOSE)) # True Suggested change, in decreasing order of preference: Note: deliberately matching the literal text would be expected to be done by escaping the left brace: pat2 = r"b\{1, 3}\Z"
bool(re.match(pat2, "b{1, 3}")) # True and this is not prevented by the suggested changes. |
Interesting. In my regex module (http://pypi.python.org/pypi/regex) I have: bool(regex.match(pat, "bb", regex.VERBOSE)) # True
bool(regex.match(pat, "b{1,3}", regex.VERBOSE)) # False because I thought that when the VERBOSE flag is turned on it should ignore whitespace except when it's inside a character class, so "b{1, 3}" would be treated as "b{1,3}". Apparently re has another exception. |
$ echo 'bbbbbaaa' | grep -o 'b\{1,3\}a'
bbba
$ echo 'bbbbbaaa' | grep -o 'b\{1, 3\}a'
grep: Invalid content of \{\}
$ echo 'bbbbbaaa' | egrep -o 'b{1,3}a'
bbba
$ echo 'bbbbbaaa' | egrep -o 'b{1, 3}a'
$ echo 'bbb{1, 3}aa' | LC_ALL=C egrep -o 'b{1, 3}a'
b{1, 3}a I.e. grep raises error and egrep chooses silent verbatim meaning. I don't know what any standards say about this. |
The question is whether re should always treat 'b{1, 3}a' as a literal, even with the VERBOSE flag. I've checked with Perl 5.14.2, and it agrees with re: adding a space _always_ makes it a literal, even with the 'x' flag (/b{1, 3}a/x is treated as /b\{1,3}a/). |
Then let's leave all as is. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: