classification
Title: Grouprefs in lookbehind assertions
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.5, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder: Lookback with group references incorrect (two issues?)
View: 9179
Assigned To: serhiy.storchaka Nosy List: effbot, glchapman, mrabarnett, python-dev, serhiy.storchaka
Priority: normal Keywords:

Created on 2003-09-29 03:31 by glchapman, last changed 2014-11-07 21:29 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
sre_parse.patch glchapman, 2003-11-02 15:31
Messages (10)
msg18411 - (view) Author: Greg Chapman (glchapman) Date: 2003-09-29 03:31
I was trying to get a pattern like this to work:

   pat = re.compile(r'(?<=(...)\1)abc')
   pat.match('jkljklabc', 6)

Unfortunately, that doesn't work.  The problem is that 
sre_parse.Subpattern.getwidth() ignores GROUPREFs 
when calculating the width, so the subpattern in the 
assertion is deemed to have length of 3 (I was hoping 
that sre could detect that the group 1 had a fixed 
length, so the reference to it would also have a fixed 
length).

I've since discovered that both Perl and PerlRE cannot 
handle the above pattern, but they both generate 
exceptions indicating that the assertion has a variable 
length pattern.  I think it would be a good idea if sre 
generated an exception as well (rather than silently 
ignoring GROUPREFs).

msg18412 - (view) Author: Greg Chapman (glchapman) Date: 2003-11-02 15:31
Logged In: YES 
user_id=86307

Attached is a patch which gives GROUPREFs an arbitrary 
variable width, so that they raise an exception if used in a 
lookbehind assertion.  Obviously, it would be better if 
GROUPREFs returned the length of the group to which they 
refer, but I don't see any obvious way for getwidth() to get 
that information (perhaps I missed something?).
msg83234 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2009-03-06 03:11
As part of issue #2636 group references now work in lookbehinds.

However, your example:

    (?<=(...)\1)abc

will fail but:

    (?<=\1(...))abc

will succeed.

Why? Well, in lookbehinds it searches backwards. In the first regex it
sees the group reference before the capture, whereas in the second it
sees the group reference after the capture. (Hope that's clear! :-))
msg114290 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-08-18 22:39
I've deliberately changed the stage to patch review and the version to 3.2 to highlight the fact that a lot of work will be needed to get the new regex engine into the standard library.  Feel free to change these as is seen fit.
msg190042 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2013-05-26 00:41
Can this be closed as a result of work done via #2636 or must it remain open?
msg190044 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2013-05-26 00:57
Issue #2636 resulted in the regex module, which supports variable-length look-behinds.

I don't know how much work it would take even to put a limited fixed-length look-behind fix for this into the re module, so I'm afraid the issue must remain open.
msg229918 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-24 12:05
The patch for issue9179 fixes this issue too.
msg230824 - (view) Author: Roundup Robot (python-dev) Date: 2014-11-07 19:49
New changeset fac649bf2d10 by Serhiy Storchaka in branch '2.7':
Issues #814253, #9179: Group references and conditional group references now
https://hg.python.org/cpython/rev/fac649bf2d10

New changeset 9fcf4008b626 by Serhiy Storchaka in branch '3.4':
Issues #814253, #9179: Group references and conditional group references now
https://hg.python.org/cpython/rev/9fcf4008b626

New changeset 60fccf0aad83 by Serhiy Storchaka in branch 'default':
Issues #814253, #9179: Group references and conditional group references now
https://hg.python.org/cpython/rev/60fccf0aad83
msg230827 - (view) Author: Roundup Robot (python-dev) Date: 2014-11-07 20:37
New changeset 0e2c7d774df3 by Serhiy Storchaka in branch '2.7':
Silence the failure of test_pyclbr after adding a property in sre_parse
https://hg.python.org/cpython/rev/0e2c7d774df3

New changeset 246c9570a757 by Serhiy Storchaka in branch '3.4':
Silence the failure of test_pyclbr after adding a property in sre_parse
https://hg.python.org/cpython/rev/246c9570a757

New changeset b2c17681404f by Serhiy Storchaka in branch 'default':
Silence the failure of test_pyclbr after adding a property in sre_parse
https://hg.python.org/cpython/rev/b2c17681404f
msg230829 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-07 21:29
Now group references to groups with fixed width are supported in lookbehind assertions.
History
Date User Action Args
2014-11-07 21:29:40serhiy.storchakasetstatus: open -> closed
versions: + Python 2.7, Python 3.4, Python 3.5, - Python 3.2
messages: + msg230829

assignee: effbot -> serhiy.storchaka
resolution: fixed
stage: patch review -> resolved
2014-11-07 20:37:06python-devsetmessages: + msg230827
2014-11-07 19:49:37python-devsetnosy: + python-dev
messages: + msg230824
2014-10-24 12:05:32serhiy.storchakasetsuperseder: Lookback with group references incorrect (two issues?)

messages: + msg229918
nosy: + serhiy.storchaka
2014-02-03 17:13:53BreamoreBoysetnosy: - BreamoreBoy
2013-05-26 00:57:26mrabarnettsetmessages: + msg190044
2013-05-26 00:42:00BreamoreBoysetmessages: + msg190042
2010-08-18 22:39:59BreamoreBoysetversions: + Python 3.2, - Python 2.7
nosy: + BreamoreBoy

messages: + msg114290

stage: patch review
2010-08-03 19:37:32terry.reedysetversions: + Python 2.7, - Python 2.5, Python 2.4, Python 2.3
2009-03-06 03:11:25mrabarnettsetnosy: + mrabarnett
messages: + msg83234
2007-09-11 06:25:46effbotsettype: behavior
versions: + Python 2.5, Python 2.4
2003-09-29 03:31:55glchapmancreate