Issue 11204: re module: strange behaviour of space inside {m, n}

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Unsupported provider

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/55413

classification

Title:	re module: strange behaviour of space inside {m, n}
Type:	behavior	Stage:	resolved
Components:	Library (Lib), Regular Expressions	Versions:	Python 3.2, Python 3.3, Python 3.4, Python 2.7

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	ezio.melotti, mrabarnett, pitrou, roysmith, serhiy.storchaka, sjmachin
Priority:	normal	Keywords:

Created on 2011-02-12 23:19 by sjmachin, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (5)
msg128472 - (view)	Author: John Machin (sjmachin)	Date: 2011-02-12 23:19
A pattern like r"b{1,3}\Z" matches "b", "bb", and "bbb", as expected. There is no documentation of the behaviour of r"b{1, 3}\Z" -- it matches the LITERAL TEXT "b{1, 3}" in normal mode and "b{1,3}" in verbose mode. # paste the following at the interactive prompt: pat = r"b{1, 3}\Z" bool(re.match(pat, "bb")) # False bool(re.match(pat, "b{1, 3}")) # True bool(re.match(pat, "bb", re.VERBOSE)) # False bool(re.match(pat, "b{1, 3}", re.VERBOSE)) # False bool(re.match(pat, "b{1,3}", re.VERBOSE)) # True Suggested change, in decreasing order of preference: (1) Ignore leading/trailing spaces when parsing the m and n components of {m,n} (2) Raise an exception if the exact syntax is not followed (3) Document the existing behaviour Note: deliberately matching the literal text would be expected to be done by escaping the left brace: pat2 = r"b\{1, 3}\Z" bool(re.match(pat2, "b{1, 3}")) # True and this is not prevented by the suggested changes.
msg176812 - (view)	Author: Matthew Barnett (mrabarnett) *	Date: 2012-12-02 22:28
Interesting. In my regex module (http://pypi.python.org/pypi/regex) I have: bool(regex.match(pat, "bb", regex.VERBOSE)) # True bool(regex.match(pat, "b{1,3}", regex.VERBOSE)) # False because I thought that when the VERBOSE flag is turned on it should ignore whitespace except when it's inside a character class, so "b{1, 3}" would be treated as "b{1,3}". Apparently re has another exception.
msg176813 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2012-12-02 22:40
$ echo 'bbbbbaaa' \| grep -o 'b\{1,3\}a' bbba $ echo 'bbbbbaaa' \| grep -o 'b\{1, 3\}a' grep: Invalid content of \{\} $ echo 'bbbbbaaa' \| egrep -o 'b{1,3}a' bbba $ echo 'bbbbbaaa' \| egrep -o 'b{1, 3}a' $ echo 'bbb{1, 3}aa' \| LC_ALL=C egrep -o 'b{1, 3}a' b{1, 3}a I.e. grep raises error and egrep chooses silent verbatim meaning. I don't know what any standards say about this.
msg176819 - (view)	Author: Matthew Barnett (mrabarnett) *	Date: 2012-12-03 00:10
The question is whether re should always treat 'b{1, 3}a' as a literal, even with the VERBOSE flag. I've checked with Perl 5.14.2, and it agrees with re: adding a space _always_ makes it a literal, even with the 'x' flag (/b{1, 3}a/x is treated as /b\{1,3}a/).
msg180700 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-01-26 19:01
Then let's leave all as is.

History
Date	User	Action	Args
2022-04-11 14:57:12	admin	set	github: 55413
2014-09-14 19:40:45	serhiy.storchaka	set	status: pending -> closed resolution: rejected stage: resolved
2013-10-27 17:27:19	serhiy.storchaka	set	status: open -> pending
2013-02-11 20:02:21	roysmith	set	nosy: + roysmith
2013-01-26 19:01:21	serhiy.storchaka	set	messages: + msg180700
2012-12-03 00:10:46	mrabarnett	set	messages: + msg176819
2012-12-02 22:40:27	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg176813
2012-12-02 22:28:55	mrabarnett	set	messages: + msg176812
2012-12-02 21:53:47	serhiy.storchaka	set	nosy: + mrabarnett type: behavior components: + Library (Lib), Regular Expressions versions: + Python 3.2, Python 3.3, Python 3.4, - Python 3.1
2011-02-18 19:55:44	terry.reedy	set	nosy: + ezio.melotti, pitrou
2011-02-12 23:19:56	sjmachin	create