Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re module: strange behaviour of space inside {m, n} #55413

Closed
sjmachin mannequin opened this issue Feb 12, 2011 · 5 comments
Closed

re module: strange behaviour of space inside {m, n} #55413

sjmachin mannequin opened this issue Feb 12, 2011 · 5 comments
Labels
stdlib Python modules in the Lib dir topic-regex type-bug An unexpected behavior, bug, or error

Comments

@sjmachin
Copy link
Mannequin

sjmachin mannequin commented Feb 12, 2011

BPO 11204
Nosy @pitrou, @ezio-melotti, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2014-09-14.19:40:45.730>
created_at = <Date 2011-02-12.23:19:56.198>
labels = ['expert-regex', 'type-bug', 'library']
title = 're module: strange behaviour of space inside {m, n}'
updated_at = <Date 2014-09-14.19:40:45.729>
user = 'https://bugs.python.org/sjmachin'

bugs.python.org fields:

activity = <Date 2014-09-14.19:40:45.729>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = True
closed_date = <Date 2014-09-14.19:40:45.730>
closer = 'serhiy.storchaka'
components = ['Library (Lib)', 'Regular Expressions']
creation = <Date 2011-02-12.23:19:56.198>
creator = 'sjmachin'
dependencies = []
files = []
hgrepos = []
issue_num = 11204
keywords = []
message_count = 5.0
messages = ['128472', '176812', '176813', '176819', '180700']
nosy_count = 6.0
nosy_names = ['sjmachin', 'roysmith', 'pitrou', 'ezio.melotti', 'mrabarnett', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue11204'
versions = ['Python 2.7', 'Python 3.2', 'Python 3.3', 'Python 3.4']

@sjmachin
Copy link
Mannequin Author

sjmachin mannequin commented Feb 12, 2011

A pattern like r"b{1,3}\Z" matches "b", "bb", and "bbb", as expected. There is no documentation of the behaviour of r"b{1, 3}\Z" -- it matches the LITERAL TEXT "b{1, 3}" in normal mode and "b{1,3}" in verbose mode.

# paste the following at the interactive prompt:
pat = r"b{1, 3}\Z"
bool(re.match(pat, "bb")) # False
bool(re.match(pat, "b{1, 3}")) # True
bool(re.match(pat, "bb", re.VERBOSE)) # False
bool(re.match(pat, "b{1, 3}", re.VERBOSE)) # False
bool(re.match(pat, "b{1,3}", re.VERBOSE)) # True

Suggested change, in decreasing order of preference:
(1) Ignore leading/trailing spaces when parsing the m and n components of {m,n}
(2) Raise an exception if the exact syntax is not followed
(3) Document the existing behaviour

Note: deliberately matching the literal text would be expected to be done by escaping the left brace:

pat2 = r"b\{1, 3}\Z"
bool(re.match(pat2, "b{1, 3}")) # True

and this is not prevented by the suggested changes.

@serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir topic-regex type-bug An unexpected behavior, bug, or error labels Dec 2, 2012
@mrabarnett
Copy link
Mannequin

mrabarnett mannequin commented Dec 2, 2012

Interesting.

In my regex module (http://pypi.python.org/pypi/regex) I have:

bool(regex.match(pat, "bb", regex.VERBOSE)) # True
bool(regex.match(pat, "b{1,3}", regex.VERBOSE)) # False

because I thought that when the VERBOSE flag is turned on it should ignore whitespace except when it's inside a character class, so "b{1, 3}" would be treated as "b{1,3}".

Apparently re has another exception.

@serhiy-storchaka
Copy link
Member

$ echo 'bbbbbaaa' | grep -o 'b\{1,3\}a'
bbba
$ echo 'bbbbbaaa' | grep -o 'b\{1, 3\}a'
grep: Invalid content of \{\}
$ echo 'bbbbbaaa' | egrep -o 'b{1,3}a'
bbba
$ echo 'bbbbbaaa' | egrep -o 'b{1, 3}a'
$ echo 'bbb{1, 3}aa' | LC_ALL=C egrep -o 'b{1, 3}a'
b{1, 3}a

I.e. grep raises error and egrep chooses silent verbatim meaning. I don't know what any standards say about this.

@mrabarnett
Copy link
Mannequin

mrabarnett mannequin commented Dec 3, 2012

The question is whether re should always treat 'b{1, 3}a' as a literal, even with the VERBOSE flag.

I've checked with Perl 5.14.2, and it agrees with re: adding a space _always_ makes it a literal, even with the 'x' flag (/b{1, 3}a/x is treated as /b\{1,3}a/).

@serhiy-storchaka
Copy link
Member

Then let's leave all as is.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-regex type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant