classification
Title: Grouprefs in lookbehind assertions
Type: behavior Stage: patch review
Components: Regular Expressions Versions: Python 3.2
process
Status: open Resolution:
Dependencies: Superseder: Lookback with group references incorrect (two issues?)
View: 9179
Assigned To: effbot Nosy List: effbot, glchapman, mrabarnett, serhiy.storchaka
Priority: normal Keywords:

Created on 2003-09-29 03:31 by glchapman, last changed 2014-10-24 12:05 by serhiy.storchaka.

Files
File name Uploaded Description Edit
sre_parse.patch glchapman, 2003-11-02 15:31
Messages (7)
msg18411 - (view) Author: Greg Chapman (glchapman) Date: 2003-09-29 03:31
I was trying to get a pattern like this to work:

   pat = re.compile(r'(?<=(...)\1)abc')
   pat.match('jkljklabc', 6)

Unfortunately, that doesn't work.  The problem is that 
sre_parse.Subpattern.getwidth() ignores GROUPREFs 
when calculating the width, so the subpattern in the 
assertion is deemed to have length of 3 (I was hoping 
that sre could detect that the group 1 had a fixed 
length, so the reference to it would also have a fixed 
length).

I've since discovered that both Perl and PerlRE cannot 
handle the above pattern, but they both generate 
exceptions indicating that the assertion has a variable 
length pattern.  I think it would be a good idea if sre 
generated an exception as well (rather than silently 
ignoring GROUPREFs).

msg18412 - (view) Author: Greg Chapman (glchapman) Date: 2003-11-02 15:31
Logged In: YES 
user_id=86307

Attached is a patch which gives GROUPREFs an arbitrary 
variable width, so that they raise an exception if used in a 
lookbehind assertion.  Obviously, it would be better if 
GROUPREFs returned the length of the group to which they 
refer, but I don't see any obvious way for getwidth() to get 
that information (perhaps I missed something?).
msg83234 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2009-03-06 03:11
As part of issue #2636 group references now work in lookbehinds.

However, your example:

    (?<=(...)\1)abc

will fail but:

    (?<=\1(...))abc

will succeed.

Why? Well, in lookbehinds it searches backwards. In the first regex it
sees the group reference before the capture, whereas in the second it
sees the group reference after the capture. (Hope that's clear! :-))
msg114290 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-08-18 22:39
I've deliberately changed the stage to patch review and the version to 3.2 to highlight the fact that a lot of work will be needed to get the new regex engine into the standard library.  Feel free to change these as is seen fit.
msg190042 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2013-05-26 00:41
Can this be closed as a result of work done via #2636 or must it remain open?
msg190044 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2013-05-26 00:57
Issue #2636 resulted in the regex module, which supports variable-length look-behinds.

I don't know how much work it would take even to put a limited fixed-length look-behind fix for this into the re module, so I'm afraid the issue must remain open.
msg229918 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-24 12:05
The patch for issue9179 fixes this issue too.
History
Date User Action Args
2014-10-24 12:05:32serhiy.storchakasetsuperseder: Lookback with group references incorrect (two issues?)

messages: + msg229918
nosy: + serhiy.storchaka
2014-02-03 17:13:53BreamoreBoysetnosy: - BreamoreBoy
2013-05-26 00:57:26mrabarnettsetmessages: + msg190044
2013-05-26 00:42:00BreamoreBoysetmessages: + msg190042
2010-08-18 22:39:59BreamoreBoysetversions: + Python 3.2, - Python 2.7
nosy: + BreamoreBoy

messages: + msg114290

stage: patch review
2010-08-03 19:37:32terry.reedysetversions: + Python 2.7, - Python 2.5, Python 2.4, Python 2.3
2009-03-06 03:11:25mrabarnettsetnosy: + mrabarnett
messages: + msg83234
2007-09-11 06:25:46effbotsettype: behavior
versions: + Python 2.5, Python 2.4
2003-09-29 03:31:55glchapmancreate