Issue 16870: re fails to match ^ when start index is specified ?

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/61074

classification

Title:	re fails to match ^ when start index is specified ?
Type:		Stage:	resolved
Components:	Regular Expressions	Versions:	Python 2.7

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	bsdphk, ezio.melotti, mrabarnett, ned.deily
Priority:	normal	Keywords:

Created on 2013-01-05 09:21 by bsdphk, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (5)
msg179116 - (view)	Author: Poul-Henning Kamp (bsdphk)	Date: 2013-01-05 09:21
I'm surprised that this does not find any matches: import re r = re.compile("^abc") s = "0123abcxyz" for i in range(0,len(s)): print(i, r.search(s, i)) I would have expected the i==4 case to match ? (This is on: Python 2.7.3 (default, Dec 14 2012, 02:46:02) [GCC 4.2.1 Compatible FreeBSD Clang 3.2 (branches/release_32 168974)] on freebsd10 )
msg179117 - (view)	Author: Ned Deily (ned.deily) *	Date: 2013-01-05 10:01
Note the warning about '^' in the documentation for the re search method: "The optional second parameter pos gives an index in the string where the search is to start; it defaults to 0. This is not completely equivalent to slicing the string; the '^' pattern character matches at the real beginning of the string and at positions just after a newline, but not necessarily at the index where the search is to start." http://docs.python.org/2/library/re.html#re.RegexObject.search
msg179118 - (view)	Author: Ned Deily (ned.deily) *	Date: 2013-01-05 10:22
To expand a bit, rather than multiple calls to search, you can use the start and end methods of the match object to determine where the string (without the '^' anchor) matches. For example: r = re.compile("abc") s = "0123abcxyz" match = r.search(s) if match: print(match.start(), match.end())
msg179121 - (view)	Author: Poul-Henning Kamp (bsdphk)	Date: 2013-01-05 12:58
I have tried hard, but have utterly failed to figure out why you have chosen the semantics for ^ you mention, tried to come up with a plausible use case, and I have utterly failed. I find it distinctly counter intuitive. I think the Principle of Least Astonishment compliant definition of ^ and $ would be that they match the start and end of the string offered for matching, ie: taking start+end into account. The real use-case behind this is searching through a mmap'ed database file, for a particular regexp in a particular field of the records, with the minimum amount of copying. The semantics you mention, makes ^ and $ useless in this, and as far as I can tell, any other scenario involving start+end arguments.
msg179132 - (view)	Author: Matthew Barnett (mrabarnett) *	Date: 2013-01-05 18:02
The semantics of '^' are common to many different regex implementations, including those of Perl and C#. The 'pos' argument merely gives the starting position the search (C# also lets you provide a starting position, and behaves in exactly the same way). Perhaps you should be using 'match' instead.

History
Date	User	Action	Args
2022-04-11 14:57:40	admin	set	github: 61074
2013-01-05 18:02:24	mrabarnett	set	messages: + msg179132
2013-01-05 12:58:44	bsdphk	set	messages: + msg179121
2013-01-05 10:22:02	ned.deily	set	messages: + msg179118
2013-01-05 10:01:41	ned.deily	set	status: open -> closed nosy: + ned.deily messages: + msg179117 resolution: not a bug stage: resolved
2013-01-05 09:21:55	bsdphk	create