classification
Title: re.finditer hangs on final empty match
Type: Stage:
Components: Regular Expressions Versions: Python 2.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: niemeyer Nosy List: effbot, kevinbutler, niemeyer
Priority: normal Keywords:

Created on 2003-10-03 15:01 by kevinbutler, last changed 2004-09-03 18:13 by niemeyer. This issue is now closed.

Files
File name Uploaded Description Edit
sre.patch niemeyer, 2004-09-03 18:13 Applied patch.
Messages (4)
msg18533 - (view) Author: Kevin J. Butler (kevinbutler) Date: 2003-10-03 15:01
The iterator returned by re.finditer appears to not
terminate if the 
final match is empty, but rather keeps returning the
final (empty) match.

Is this a bug in _sre?  If so, I'll be happy to file
it, though fixing 
it is a bit beyond my _sre experience level at this
point.  The solution 
would appear to be to either a check for duplicate
match in 
iterator.next(), or to increment position by one after
returning an 
empty match (which should be OK, because if a non-empty
match started at 
that location, we would have returned it instead of the
empty match).

Code to illustrate the failure:

from re import finditer

last = None
for m in finditer( ".*", "asdf" ):
    if last == m.span():
        print "duplicate match:", last
        break
    print m.group(), m.span()
    last = m.span()
   
---
asdf (0, 4)
 (4, 4)
duplicate match: (4, 4)
---

findall works:

print re.findall( ".*", "asdf" )
['asdf', '']

Workaround is to explicitly check for a duplicate span,
as I did above, 
or to check for a duplicate end(), which avoids the
final empty match

Seo Sanghyeon sent the following fix to python-dev list:

Attached one line patch fixes re.finditer bug reported by
Kevin J. Butler. I read cvs log to find out why this
code is
introduced, and it seems to be related to SF bug #581080.

But that bug didn't appear after my patch, so I wonder
why it was introduced in the first place. It seems beyond
my understanding. Please enlighten me.

To test:

#581080
import re
list(re.finditer('\s', 'a b'))
# expected: one item list
# bug: hang

#Kevin J. Butler
import re
list(re.finditer('.*', 'asdf'))
# expected: two item list (?)
# bug: hang

Seo Sanghyeon
-------------- next part --------------
? patch
Index: Modules/_sre.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Modules/_sre.c,v
retrieving revision 2.99
diff -c -r2.99 _sre.c
*** Modules/_sre.c	26 Jun 2003 14:41:08 -0000	2.99
--- Modules/_sre.c	2 Oct 2003 03:48:55 -0000
***************
*** 3062,3069 ****
      match = pattern_new_match((PatternObject*)
self->pattern,
                                 state, status);
  
!     if ((status == 0 || state->ptr == state->start) &&
!         state->ptr < state->end)
          state->start = (void*) ((char*) state->ptr +
state->charsize);
      else
          state->start = state->ptr;
--- 3062,3068 ----
      match = pattern_new_match((PatternObject*)
self->pattern,
                                 state, status);
  
!     if (status == 0 || state->ptr == state->start)
          state->start = (void*) ((char*) state->ptr +
state->charsize);
      else
          state->start = state->ptr;
msg18534 - (view) Author: Kevin J. Butler (kevinbutler) Date: 2003-10-03 18:16
Logged In: YES 
user_id=117665

The above patch does resolve the problem.

The code was introduced in rev 2.85
http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Modules/_sre.c
to resolve bug 581080
http://sourceforge.net/tracker/index.php?func=detail&aid=581080&group_id=5470&atid=105470
but removing this line does not re-introduce that bug.

Thanks, and kudos to Seo...
msg18535 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2004-09-03 12:04
Logged In: YES 
user_id=38376

Still there in 2.4a3, as the following revised example shows:

import re

m = re.finditer(".*", "asdf")

print m.next().span()
print m.next().span()
print m.next().span() # this should raise an exception

Gustavo, can you look at this patch too?
msg18536 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2004-09-03 18:13
Logged In: YES 
user_id=7887

Patch applied and test cases added to check this bug and also for 
#581080. 
 
Kevin and Seo, thanks for the bug report and the fix. 
 
Fredrik, thanks for pointing me to the issue. 
 
Applied as: 
 
Lib/test/test_re.py: 1.52 
Modules/_sre.c: 2.108 
 
Patch attached for reference. 
 
History
Date User Action Args
2003-10-03 15:01:52kevinbutlercreate