classification
Title: SRE bugs with capturing groups in negative assertions
Type: compile error Stage:
Components: Regular Expressions Versions: Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: niemeyer Nosy List: daniel_py, draghuram, georg.brandl, glchapman, niemeyer
Priority: normal Keywords:

Created on 2003-04-21 18:22 by glchapman, last changed 2009-02-23 21:48 by georg.brandl. This issue is now closed.

Files
File name Uploaded Description Edit
neg_groups_patch.txt glchapman, 2003-04-21 18:22
assert_patch.txt glchapman, 2003-04-22 18:46 new patch replacing previous with added fix for pos. assertions
ftp_eme.py daniel_py, 2009-02-23 19:14 Transfering Files via FTP
Messages (7)
msg15565 - (view) Author: Greg Chapman (glchapman) Date: 2003-04-21 18:22
SRE is broken in some subtle ways when you combine 
capturing groups with assertions.  For example:

>>> re.match('((?!(a)c)[ab])*', 'abc').groups()
('b', '')

In the above '(a)' has matched an empty string.  Or 
worse:

>>> re.match('(a)((?!(b)*))*', 'abb').groups()
('b', None, None)

Here '(a)' matches 'b'.

Although Perl reports matches for groups in negative 
assertions, I think it is better to adopt the PCRE rule 
that these groups are always reported as unmatched 
outside the assertion (inside the assertion, if used with 
backreferences, they should behave as normal).  This 
would make the handling of subpatterns in negative 
assertions consistent with that of subpatterns in 
branches:

>>> re.match('(a)c|ab', 'ab').groups()
(None,)

In the above, although '(a)' matches before the branch 
fails, the failure of the branch means '(a)' is considered 
not to have matched.

Anyway, the attached patch is an effort to fix this 
problem by saving the values of marks before calling the 
assertion, and then restoring them afterwards (thus 
undoing whatever might have been done in the assertion).
msg15566 - (view) Author: Greg Chapman (glchapman) Date: 2003-04-22 18:46
Logged In: YES 
user_id=86307

In thinking further, I realized that positive assertions are also 
affected by the second problem.  E.g.:

>>> re.match('(a)(?:(?=(b)*)c)*', 'abb').groups()
('b', None)

The problem here is that a successful match in an assertion 
can leave marks at the top of the mark stack which then get 
popped in the wrong place.  Attaching a new patch which 
should catch this problem for both kinds of assertions (and 
which also should "unmark" groups in negative assertions).
msg15567 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2003-04-27 13:28
Logged In: YES 
user_id=7887

Greg, I think there are two different issues here.

One of them is related to a wrong behavior from
mark_save/restore(), which don't restore the stackbase
before restoring the marks. Asserts were afected because
they're the only recursive ops that can continue the loop,
but the problem would happen to any operation with the same
behavior. So, rather than hardcoding this into asserts, I
have changed mark_save/restore() to always restore the
stackbase before restoring the marks. This should fix these
two cases you presented:

>>> re.match('(a)(?:(?=(b)*)c)*', 'abb').groups()
>>> re.match('(a)((?!(b)*))*', 'abb').groups()

And was applied as:

Modules/_sre.c: 2.95
Lib/test/test_re.py: 1.41

The other issue is related to the asserts which are leaving
half-marked groups. While your solution does work, it
changes the current behavior, which is also compatible to
how perl works. I understand that individual groups matching
when the whole string doesn't match is atypical. OTOH, a
negative assertion *is* atypical, and IMO denying external
group access won't help the user to understand how it works.
In other words, I think it's better to have incomplete
support for the moment, than having none.
This way, we can think further about this, and look for an
elegant solution to fix that support, certainly including
some algorithm to check for half-marked groups.

Thank you very much for spotting these bugs, and submitting
a solution for them.
msg15568 - (view) Author: Greg Chapman (glchapman) Date: 2003-04-28 15:46
Logged In: YES 
user_id=86307

Gustavo,

Just a quick note on compatibility.  As I mentioned, the PCRE 
rule is that groups in negative asserts do not match.  So, with 
Python 2.2, you get this result using pre:

>>> pre.match('((?!(a)c)[ab])*', 'abc').groups()
('b', None)

Although I understand your argument, I'll just say that I 
personally think it would be better to move toward compatibility 
with pre (which after all was the official Python regex engine 
before sre), rather than to remain partly compatible with Perl.

Thanks for reviewing my patches and fixing the bugs!
msg62179 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2008-02-07 20:50
looks to have been fixed.
msg82636 - (view) Author: daniel garay (daniel_py) Date: 2009-02-23 19:14
I need help to resolve this problem caused when program a scheduled task
whit CRONTAB, generated result:

Traceback (most recent call last):
  File "ftp_eme.py", line 12, in ?
    import datetime, os, ftplib
  File "/usr/local/lib/python2.6/ftplib.py", line 46, in ?
    import socket
  File "/usr/local/lib/python2.6/socket.py", line 50, in ?
    import _ssl
  File "/usr/local/lib/python2.6/_ssl.py", line 58, in ?
    import textwrap
  File "/usr/local/lib/python2.6/textwrap.py", line 10, in ?
    import string, re
  File "/usr/local/lib/python2.6/string.py", line 81, in ?
    import re as _re
  File "/usr/local/lib/python2.6/re.py", line 105, in ?
    import sre_compile
  File "/usr/local/lib/python2.6/sre_compile.py", line 17, in ?
    assert _sre.MAGIC == MAGIC, "SRE module mismatch"
AssertionError: SRE module mismatch
No message, no subject; hope that's ok
msg82644 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2009-02-23 21:48
Please do not hijack existing issues. In the case of this problem, do
not open an issue at all, but ask in a Python mailing list or newsgroup.
History
Date User Action Args
2009-02-23 21:48:45georg.brandlsetnosy: + georg.brandl
messages: + msg82644
components: + Regular Expressions, - Windows
title: _sre.MAGIC -> SRE bugs with capturing groups in negative assertions
2009-02-23 19:14:09daniel_pysetfiles: + ftp_eme.py
versions: + Python 2.6, - Python 2.3
nosy: + daniel_py
title: SRE bugs with capturing groups in negative assertions -> _sre.MAGIC
messages: + msg82636
components: + Windows, - Regular Expressions
type: compile error
2008-02-07 20:50:26draghuramsetstatus: open -> closed
resolution: fixed
messages: + msg62179
nosy: + draghuram
2003-04-21 18:22:00glchapmancreate