This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python 2.1b1 re module is broken!
Type: Stage:
Components: Regular Expressions Versions:
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: effbot Nosy List: effbot, gregory.p.smith, moshez, tim.peters
Priority: high Keywords:

Created on 2001-03-17 03:40 by gregory.p.smith, last changed 2022-04-10 16:03 by admin. This issue is now closed.

Messages (7)
msg3898 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2001-03-17 03:40
the following should -not- match:

$ python
Python 2.1b1 (#1, Mar 12 2001, 18:20:53) 
[GCC 2.95.2 20000220 (Debian GNU/Linux)] on linux2
Type "copyright", "credits" or "license" for more
information.
>>> reg = r"(?im)<dtml-var\s+([a-z_0-9]+?)\s*>"
>>> str = '<dtml-var
expr="Presentation.show(\'start\')">'
>>> import re                                
>>> re.match(reg, str)                       
<SRE_Match object at 0x810d9d0>


In python 1.5.2 and 2.0 this works fine.
msg3899 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-03-18 06:20
Logged In: YES 
user_id=31435

Just adding a comment to force SF to send this as email (so 
I can read it).
msg3900 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-03-18 06:33
Logged In: YES 
user_id=31435

Assigned to Fredrik and boosted priority.

Gregory, it's hard to see exactly what your str vrbl 
contains because there appears to be an embedded newline in 
it.  Whatever, if I change your

    +?

to the semantically equivalent

    *

then the problem goes away for what *I* guessed you 
intended str to contain.  The

    [a-z_0-9]

part is also better written as

    \w

(since you're using the ?i flag, same thing).
msg3901 - (view) Author: Moshe Zadka (moshez) (Python triager) Date: 2001-03-18 11:38
Logged In: YES 
user_id=11645

Here is a simpler test case which shows the same
problem:

>>> str, r
('e=>', '(e+?)>')
>>> re.match(r, str)
<SRE_Match object at 0x4015f2e0>
>>> pre.match(r, str)
>>> 

If we lose the laziness (make the pattern "(e+)>") then it
works OK.

So the crucial problem seems to be the compilation/execution
of the lazy patterns, *not* the compilation/execution of
character classes.
msg3902 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-03-18 18:58
Logged In: YES 
user_id=31435

So, Moshe, what's worse:  floating-point or regexps <2/3 
wink>?  For the life of me, I'll never be able to read +? 
as a minimal match -- it's so clearly "match one or more, 
but optionally"!
msg3903 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2001-03-21 19:03
Logged In: YES 
user_id=38376

same as #233283
msg3904 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2001-03-22 17:15
Logged In: YES 
user_id=38376

fixed in 2.1b2
History
Date User Action Args
2022-04-10 16:03:52adminsetgithub: 34174
2001-03-17 03:40:00gregory.p.smithcreate