Issue 409311: Python 2.1b1 re module is broken!

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/34174

classification

Title:	Python 2.1b1 re module is broken!
Type:		Stage:
Components:	Regular Expressions	Versions:

process

Status:	closed	Resolution:	duplicate
Dependencies:		Superseder:
Assigned To:	effbot	Nosy List:	effbot, gregory.p.smith, moshez, tim.peters
Priority:	high	Keywords:

Created on 2001-03-17 03:40 by gregory.p.smith, last changed 2022-04-10 16:03 by admin. This issue is now closed.

Messages (7)
msg3898 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2001-03-17 03:40
the following should -not- match: $ python Python 2.1b1 (#1, Mar 12 2001, 18:20:53) [GCC 2.95.2 20000220 (Debian GNU/Linux)] on linux2 Type "copyright", "credits" or "license" for more information. >>> reg = r"(?im)<dtml-var\s+([a-z_0-9]+?)\s*>" >>> str = '<dtml-var expr="Presentation.show(\'start\')">' >>> import re >>> re.match(reg, str) <SRE_Match object at 0x810d9d0> In python 1.5.2 and 2.0 this works fine.
msg3899 - (view)	Author: Tim Peters (tim.peters) *	Date: 2001-03-18 06:20
Logged In: YES user_id=31435 Just adding a comment to force SF to send this as email (so I can read it).
msg3900 - (view)	Author: Tim Peters (tim.peters) *	Date: 2001-03-18 06:33
Logged In: YES user_id=31435 Assigned to Fredrik and boosted priority. Gregory, it's hard to see exactly what your str vrbl contains because there appears to be an embedded newline in it. Whatever, if I change your +? to the semantically equivalent * then the problem goes away for what I guessed you intended str to contain. The [a-z_0-9] part is also better written as \w (since you're using the ?i flag, same thing).
msg3901 - (view)	Author: Moshe Zadka (moshez)	Date: 2001-03-18 11:38
Logged In: YES user_id=11645 Here is a simpler test case which shows the same problem: >>> str, r ('e=>', '(e+?)>') >>> re.match(r, str) <SRE_Match object at 0x4015f2e0> >>> pre.match(r, str) >>> If we lose the laziness (make the pattern "(e+)>") then it works OK. So the crucial problem seems to be the compilation/execution of the lazy patterns, not the compilation/execution of character classes.
msg3902 - (view)	Author: Tim Peters (tim.peters) *	Date: 2001-03-18 18:58
Logged In: YES user_id=31435 So, Moshe, what's worse: floating-point or regexps <2/3 wink>? For the life of me, I'll never be able to read +? as a minimal match -- it's so clearly "match one or more, but optionally"!
msg3903 - (view)	Author: Fredrik Lundh (effbot) *	Date: 2001-03-21 19:03
Logged In: YES user_id=38376 same as #233283
msg3904 - (view)	Author: Fredrik Lundh (effbot) *	Date: 2001-03-22 17:15
Logged In: YES user_id=38376 fixed in 2.1b2

History
Date	User	Action	Args
2022-04-10 16:03:52	admin	set	github: 34174
2001-03-17 03:40:00	gregory.p.smith	create