Issue 1708652: Exact matching

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/44907

classification

Title:	Exact matching
Type:	enhancement	Stage:
Components:	Regular Expressions	Versions:	Python 3.2

process

Status:	closed	Resolution:	duplicate
Dependencies:		Superseder:	Proposal: add re.fullmatch() method View: 16203
Assigned To:		Nosy List:	georg.brandl, loewis, mrabarnett, niemeyer, r.david.murray, rhettinger, serhiy.storchaka, timehorse, tlynn
Priority:	normal	Keywords:

Created on 2007-04-27 11:35 by tlynn, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (15)
msg55094 - (view)	Author: Tom Lynn (tlynn)	Date: 2007-04-27 11:35
I'd like to see a regexp.exact() method on regexp objects, equivalent to regexp.search(r'\A%s\Z' % pattern, ...), for parsing binary formats. It's probably not worth disturbing the current library interface for, but maybe in Py3k?
msg55095 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2007-05-01 15:42
Moving to the feature requests tracker. Notice that in Py3k, the string type will be a Unicode type, so it's not clear to me that regular expressions on binary data will still be supported.
msg74685 - (view)	Author: Jeffrey C. Jacobs (timehorse)	Date: 2008-10-13 13:31
Binary format searches should be supported once issue 1282 is implemented, likely as part of issue 2636 Item 32. That said, I'm not clear what you mean by exact search; wouldn't you want match instead? If your main issue is you want something that automatically binds to the beginning and ending of input, then I suppose we could add an 'exact' method where 'search' searches anywhere, 'match' matches from the start of input and 'exact' matches from beginning to ending. I'd call that a separate issue, though. In other words: byte-oriented matches is covered by 1282 and adding an 'exact' method is the only new issue here. Does that sound right?
msg74688 - (view)	Author: Tom Lynn (tlynn)	Date: 2008-10-13 14:46
Yes, that's right. The binary aspect of it was something of a red herring, I'm afraid, although I still think that (or parsing in general) is an important use case. The prime motivation it that it's easy to either forget the '\Z' or to use '$' instead, which both cause subtle bugs. An exact() method might help to avoid that.
msg116676 - (view)	Author: Matthew Barnett (mrabarnett) *	Date: 2010-09-17 16:08
Does this request still stand? If so then I'll add it to the new regex module.
msg116688 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-09-17 17:27
I would say you should make the call on whether or not it is worth adding. IIUC it would mean there was more than one way to do something (\Z vs 'exact'), so I personally am -0 on the feature request. But I'm not a frequent regex user, so I don't think my opinion should count for much.
msg116724 - (view)	Author: Tom Lynn (tlynn)	Date: 2010-09-17 21:48
I don't know whether it should stand, I'm somewhere around 0 on it myself. So I guess that means it shouldn't, since it's easier to add features than remove them. The problem is that once you're aware of the need for it you need it less. In case other people are +1, I'll note that "exact" isn't a very nice name either, not being a verb. "exact_match" is a bit long but probably better (and better than "match_exact").
msg116755 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2010-09-18 06:34
I'm not sure it really is so useful that it warrants a new regex method.
msg116764 - (view)	Author: Tom Lynn (tlynn)	Date: 2010-09-18 11:42
I'm still unsure. I think this confusion does cause bugs in real-world code. Perhaps more prominence for \A and \Z in the docs? There's already a section comparing regexps starting '^' with match under "Matching vs Searching". The problem is basically that ^ and $ have weird semantics but are better recognised than \A and \Z. Looking over the docs again I see that the docs for $ are still misleading, in a way that's related to this issue: foo matches both 'foo' and 'foobar', while the regular expression foo$ matches only 'foo'. "foo$ matches only 'foo' (out of 'foo' and 'foobar')" is the correct interpretation of that, but it's easy to read it as "foo$ means exact_match('foo')", which is the misconception I was hoping to put to rest with this (foo$ also matches the 'foo' part of 'foo\nbar', even with flags=0).
msg116765 - (view)	Author: Tom Lynn (tlynn)	Date: 2010-09-18 11:57
Actually, looking at the second part of the docs for $ (on "foo.$") makes me think the main motivating case here may be a bug in re.match:: >>> re.match('foo$', 'foo\n\n') >>> re.match('foo$', 'foo\n') <_sre.SRE_Match object at 0x00A98678> Shortening an input string shouldn't ever cause it to match, should it?
msg116771 - (view)	Author: Tom Lynn (tlynn)	Date: 2010-09-18 12:51
Oh dear, I'm wrong on two fronts (I wish Roundup had post editing). a) foo$ doesn't match the 'foo' part of 'foo\nbar' as I stated above, but does match the 'foo' part of 'foo\n'. b) Obviously shortening an input string can cause it to match. It's still weird though.
msg116833 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2010-09-18 22:45
Can we close this one?
msg116837 - (view)	Author: Matthew Barnett (mrabarnett) *	Date: 2010-09-18 23:03
'$' matches at the end of the string or at a newline at the end of a string (if multiline mode isn't turned on). '\Z' matches only at the end of the string. If not even the OP is convinced of the need, then I have no objection to closing.
msg116843 - (view)	Author: Tom Lynn (tlynn)	Date: 2010-09-19 00:12
(Sorry to comment on a closed issue, it was closed as I was writing this.) It's not that I'm not convinced of the need, just not of the solution. I still think that there are problems here: a) forgetting any \Z or $ terminator to .match() is easy, b) $ is easily misunderstood (and not just by me) and I suspect commonly dangerously misused in validation routines as a result, c) '(?:%s)\Z' % regexp is noisy, combines two less-understood features, and makes simple regexps hard to read, d) '(?:%s)\Z' % regexp.pattern requires recompilation of the regexp. I think another method is probably the best solution to these, but it may have too much cost (though I'm not sure what that cost would be). Largely orthogonally, I'd like to see \Z encouraged over $ in the docs, and preferably a version of this table (probably under Matching vs Searching), corrected if I'm wrong of course: NON-MULTILINE: '^' is equivalent to '\A' '$' is equivalent to '(?:\Z\|(?=\n\Z))' MULTILINE: '^' is equivalent to '(?:\A\|(?<=\n))' '$' is equivalent to '(?:\Z\|(?=\n))' But the docs already try to express the above table (or its correction) in English, so you may feel it wouldn't add anything, in which case I'd still like to see any corrections for my own edification if possible.
msg230856 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-11-08 13:26
Was implemented as fullmatch() in issue16203.

History
Date	User	Action	Args
2022-04-11 14:56:24	admin	set	github: 44907
2014-11-08 13:26:22	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg230856 resolution: rejected -> duplicate superseder: Proposal: add re.fullmatch() method
2010-09-19 00:13:00	tlynn	set	messages: + msg116843
2010-09-18 23:34:27	rhettinger	set	status: open -> closed resolution: rejected
2010-09-18 23:03:54	mrabarnett	set	messages: + msg116837
2010-09-18 22:45:43	rhettinger	set	nosy: + rhettinger messages: + msg116833
2010-09-18 12:51:35	tlynn	set	messages: + msg116771
2010-09-18 11:57:25	tlynn	set	messages: + msg116765
2010-09-18 11:42:17	tlynn	set	messages: + msg116764
2010-09-18 06:34:46	georg.brandl	set	nosy: + georg.brandl messages: + msg116755
2010-09-17 21:48:39	tlynn	set	messages: + msg116724
2010-09-17 17:27:22	r.david.murray	set	assignee: niemeyer -> messages: + msg116688 nosy: + r.david.murray
2010-09-17 16:08:11	mrabarnett	set	messages: + msg116676
2010-09-17 15:49:01	BreamoreBoy	set	nosy: + mrabarnett
2010-08-21 23:23:23	georg.brandl	set	versions: + Python 3.2, - Python 2.7
2008-10-13 14:46:41	tlynn	set	messages: + msg74688
2008-10-13 13:31:13	timehorse	set	messages: + msg74685
2008-09-28 19:31:24	timehorse	set	nosy: + timehorse versions: + Python 2.7
2007-04-27 11:35:43	tlynn	create