This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Lookbehind assertions go behind the start position for the match
Type: behavior Stage: resolved
Components: Documentation, Regular Expressions Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Devin Jeanpierre, docs@python, ezio.melotti, mrabarnett
Priority: normal Keywords:

Created on 2012-02-12 08:54 by Devin Jeanpierre, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (4)
msg153188 - (view) Author: Devin Jeanpierre (Devin Jeanpierre) * Date: 2012-02-12 08:54
compiled regex objects' match method offers an optional "pos" parameter described to be roughly equivalent to slicing except for how it treats the "^" operation. See http://docs.python.org/library/re.html#re.RegexObject.search

However, the behavior of lookbehind assertions also differs:

>>> re.compile("(?<=a)b").match("ab", 1)
<_sre.SRE_Match object at 0x...>
>>> re.compile("(?<=a)b").match("ab"[1:])
>>>

This alone might be a documentation bug, but the behavior is also inconsistent with the behavior of lookahead assertions, which do *not* look past the endpos:

>>> re.compile("a(?=b)").match("ab", 0, 1)
>>> re.compile("a(?=b)").match("ab")
<_sre.SRE_Match object at 0x...>
>>>
msg153284 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2012-02-13 17:39
The documentation says of the 'pos' parameter "This is not completely equivalent to slicing the string" and of the 'endpos' parameter "it will be as if the string is endpos characters long".

In other words, it starts searching at 'pos' but truncates at 'endpos'.

Yes, it's inconsistent, but it's documented.
msg153285 - (view) Author: Devin Jeanpierre (Devin Jeanpierre) * Date: 2012-02-13 17:54
If it's intended behaviour, then I'd request that the documentation specifically mention lookbehind assertions the way it does with "^".

Saying "it's slightly different" doesn't make clear the ways in which it is different, and that's important for people writing or using regular expressions.
msg154626 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-02-29 12:34
IMHO the documentation is fine as is.  Using pos in combination with lookarounds that match on the beginning/end of the "slice" seems a rather uncommon corner case, and I don't think it's worth documenting it.  Even if it was documented, as a user, I would just try it from the interpreter anyway, rather than checking the docs for some prose to decipher.
History
Date User Action Args
2022-04-11 14:57:26adminsetgithub: 58206
2012-02-29 12:34:34ezio.melottisetstatus: open -> closed

assignee: docs@python
components: + Documentation
versions: + Python 3.3, - Python 3.1
nosy: + docs@python

messages: + msg154626
resolution: wont fix
stage: resolved
2012-02-13 17:54:24Devin Jeanpierresetmessages: + msg153285
2012-02-13 17:39:22mrabarnettsetnosy: + mrabarnett
messages: + msg153284
2012-02-12 08:54:43Devin Jeanpierrecreate