Title: Grouprefs in lookbehind assertions
Type: behavior Stage: patch review
Components: Regular Expressions Versions: Python 3.2
Status: open Resolution:
Dependencies: Superseder: Lookback with group references incorrect (two issues?)
View: 9179
Assigned To: effbot Nosy List: effbot, glchapman, mrabarnett, serhiy.storchaka
Priority: normal Keywords:

Created on 2003-09-29 03:31 by glchapman, last changed 2014-10-24 12:05 by serhiy.storchaka.

File name Uploaded Description Edit
sre_parse.patch glchapman, 2003-11-02 15:31
Messages (7)
msg18411 - (view) Author: Greg Chapman (glchapman) Date: 2003-09-29 03:31
I was trying to get a pattern like this to work:

   pat = re.compile(r'(?<=(...)\1)abc')
   pat.match('jkljklabc', 6)

Unfortunately, that doesn't work.  The problem is that 
sre_parse.Subpattern.getwidth() ignores GROUPREFs 
when calculating the width, so the subpattern in the 
assertion is deemed to have length of 3 (I was hoping 
that sre could detect that the group 1 had a fixed 
length, so the reference to it would also have a fixed 

I've since discovered that both Perl and PerlRE cannot 
handle the above pattern, but they both generate 
exceptions indicating that the assertion has a variable 
length pattern.  I think it would be a good idea if sre 
generated an exception as well (rather than silently 
ignoring GROUPREFs).

msg18412 - (view) Author: Greg Chapman (glchapman) Date: 2003-11-02 15:31
Logged In: YES 

Attached is a patch which gives GROUPREFs an arbitrary 
variable width, so that they raise an exception if used in a 
lookbehind assertion.  Obviously, it would be better if 
GROUPREFs returned the length of the group to which they 
refer, but I don't see any obvious way for getwidth() to get 
that information (perhaps I missed something?).
msg83234 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2009-03-06 03:11
As part of issue #2636 group references now work in lookbehinds.

However, your example:


will fail but:


will succeed.

Why? Well, in lookbehinds it searches backwards. In the first regex it
sees the group reference before the capture, whereas in the second it
sees the group reference after the capture. (Hope that's clear! :-))
msg114290 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-08-18 22:39
I've deliberately changed the stage to patch review and the version to 3.2 to highlight the fact that a lot of work will be needed to get the new regex engine into the standard library.  Feel free to change these as is seen fit.
msg190042 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2013-05-26 00:41
Can this be closed as a result of work done via #2636 or must it remain open?
msg190044 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2013-05-26 00:57
Issue #2636 resulted in the regex module, which supports variable-length look-behinds.

I don't know how much work it would take even to put a limited fixed-length look-behind fix for this into the re module, so I'm afraid the issue must remain open.
msg229918 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-24 12:05
The patch for issue9179 fixes this issue too.
