classification
Title: re incompatibility in sre
Type: behavior Stage: committed/rejected
Components: Regular Expressions Versions: Python 2.6
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: effbot Nosy List: ajaksu2, effbot, georg.brandl, gvanrossum, loewis, timehorse, tmick
Priority: normal Keywords:

Created on 2000-09-11 08:24 by loewis, last changed 2009-02-12 19:32 by ajaksu2. This issue is now closed.

Messages (8)
msg1325 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2000-09-11 08:24
[submitted by Adam Sampson]

Under Python 1.5.2, I had a script containing the following line:

m = re.match(r"[a-z0-9]*://[^/]+/.*\.([^.#\?/]*)([#\?]?.*)?", url)

Under 1.6, this fails with:

[...]
  File "/usr/local/lib/python1.6/sre.py", line 44, in match                                              
    return _compile(pattern, flags).match(string)                                                        
  File "/usr/local/lib/python1.6/sre.py", line 102, in _compile                                          
    raise error, v # invalid expression                                                                  
sre_constants.error: nothing to repeat

I can narrow it down to:

>>> m = re.match(r"(x?)?", url)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/lib/python1.6/sre.py", line 44, in match
    return _compile(pattern, flags).match(string)
  File "/usr/local/lib/python1.6/sre.py", line 102, in _compile
    raise error, v # invalid expression
sre_constants.error: nothing to repeat

whereas:

>>> m = re.match(r"(x?.)?", url)

works fine. Is this correct behaviour for SRE, or am I just being stupid?
"(x?)?" looks like a perfectly reasonable Perl-style regexp to me (and Perl
too)...
msg1326 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2000-10-01 04:33
Martin, is this still broken in 2.0? Fredrik, any idea?
msg1327 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2000-10-01 18:13
Yes, it is still broken in 2.0b2.
msg1328 - (view) Author: Trent Mick (tmick) Date: 2006-04-10 23:11
Logged In: YES 
user_id=34892

I've run into another incarnation of this (it breaks in
Python 2.3.5 and Python 2.4.3):

>>> import sre
  >>> sre.compile("(a*)?")
  Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    File "C:\Python24\Lib\sre.py", line 180, in compile
      return _compile(pattern, flags)
    File "C:\Python24\Lib\sre.py", line 227, in _compile
      raise error, v # invalid expression
  sre_constants.error: nothing to repeat

Now granted that the '?' here is redundant for the '*'
quantifier on 'a', but compiling this regex works with
Python 2.3's "pre" and it works in Perl.

The actual use case I've hit here is trying to compile all
the regex's in Fedora Core 5's SELinux config files
(/etc/selinux/targeted/contexts/files/file_contexts*). The
first such regex that broke was:
  '/usr/share/selinux-policy([^/]*)?/html(/.*)?'
msg1329 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-06-01 13:45
Logged In: YES 
user_id=849994

#1456280 is a duplicate of this.
msg1330 - (view) Author: Collin Winter (collinwinter) * (Python committer) Date: 2007-03-21 17:49
The original bug no longer applies to Python 2.5.0 or 2.6a0. Trent's bug still exists in Python 2.5.0 and 2.6a0 (where 2.6a0 == SVN r54478).
msg74683 - (view) Author: Jeffrey C. Jacobs (timehorse) Date: 2008-10-13 13:05
The duplicate zero-or-one repeat operator bug described in this issue 
originally no longer exists in python 2.6.

However, Trent Mick brings up a fair point in that expressions of the 
form (x*)? generate an error (issue 1456280) when internally the '?' 
should be passively stripped from the expression by the Python Regular 
Expression Compiler because it is redundant.  The same goes for 
expressions of the form (x*)* (issue 2537).  Also, there is a problem 
with expressions of the form (x*){n,m} (issue 1633953), since the x* 
matches as much as it can, and thus it sees the range repeat operation 
as redundant -- in this case I think the range repeat should have the 
effect of matching (x*)(x*)(x*)... n to m times, but since the first 
time matches everything, the subsequent matches all match zero-width 
expressions following the first one.  I am tracking all of these issues 
under Item 33 of Issue 2636.

The are the 3 known redundant repeat issues, but this one, the zero-or-
one followed by zero-or-one is AFAICT fixed in python 2.6 as the 
expression originally listed now passes compile.
msg81807 - (view) Author: Daniel Diniz (ajaksu2) Date: 2009-02-12 19:32
Jeffrey, Collin, thanks for reviewing.
History
Date User Action Args
2009-02-12 19:32:42ajaksu2setstatus: open -> closed
nosy: + ajaksu2
resolution: out of date
messages: + msg81807
stage: committed/rejected
2008-10-13 21:59:45hayposetcomponents: + Regular Expressions, - Extension Modules
2008-10-13 13:05:53timehorsesetmessages: + msg74683
2008-10-01 14:19:22collinwintersetnosy: - collinwinter
2008-09-28 19:30:21timehorsesetnosy: + timehorse
2007-09-10 21:04:28brett.cannonsettype: behavior
resolution: accepted -> (no value)
versions: + Python 2.6
2007-09-10 20:59:40brett.cannonlinkissue1456280 superseder
2000-09-11 08:24:15loewiscreate