Title: Exact matching
Type: enhancement Stage:
Components: Regular Expressions Versions: Python 3.2
Status: closed Resolution: duplicate
Dependencies: Superseder: Proposal: add re.fullmatch() method
View: 16203
Assigned To: Nosy List: georg.brandl, loewis, mrabarnett, niemeyer, r.david.murray, rhettinger, serhiy.storchaka, timehorse, tlynn
Priority: normal Keywords:

Created on 2007-04-27 11:35 by tlynn, last changed 2014-11-08 13:26 by serhiy.storchaka. This issue is now closed.

Messages (15)
msg55094 - (view) Author: Tom Lynn (tlynn) Date: 2007-04-27 11:35
I'd like to see a regexp.exact() method on regexp objects, equivalent to'\A%s\Z' % pattern, ...), for parsing binary formats.

It's probably not worth disturbing the current library interface for, but maybe in Py3k?
msg55095 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-05-01 15:42
Moving to the feature requests tracker.

Notice that in Py3k, the string type will be a Unicode type, so it's not clear to me that regular expressions on binary data will still be supported.
msg74685 - (view) Author: Jeffrey C. Jacobs (timehorse) Date: 2008-10-13 13:31
Binary format searches should be supported once issue 1282 is implemented, 
likely as part of issue 2636 Item 32.  That said, I'm not clear what you 
mean by exact search; wouldn't you want match instead?  If your main issue 
is you want something that automatically binds to the beginning and ending 
of input, then I suppose we could add an 'exact' method where 'search' 
searches anywhere, 'match' matches from the start of input and 'exact' 
matches from beginning to ending.  I'd call that a separate issue, though.  
In other words: byte-oriented matches is covered by 1282 and adding an 
'exact' method is the only new issue here.  Does that sound right?
msg74688 - (view) Author: Tom Lynn (tlynn) Date: 2008-10-13 14:46
Yes, that's right. The binary aspect of it was something of a red
herring, I'm afraid, although I still think that (or parsing in general)
is an important use case. The prime motivation it that it's easy to
either forget the '\Z' or to use '$' instead, which both cause subtle
bugs. An exact() method might help to avoid that.
msg116676 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2010-09-17 16:08
Does this request still stand? If so then I'll add it to the new regex module.
msg116688 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-09-17 17:27
I would say you should make the call on whether or not it is worth adding.  IIUC it would mean there was more than one way to do something (\Z vs 'exact'), so I personally am -0 on the feature request.  But I'm not a frequent regex user, so I don't think my opinion should count for much.
msg116724 - (view) Author: Tom Lynn (tlynn) Date: 2010-09-17 21:48
I don't know whether it should stand, I'm somewhere around 0 on it myself. So I guess that means it shouldn't, since it's easier to add features than remove them. The problem is that once you're aware of the need for it you need it less.

In case other people are +1, I'll note that "exact" isn't a very nice name either, not being a verb. "exact_match" is a bit long but probably better (and better than "match_exact").
msg116755 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-09-18 06:34
I'm not sure it really is so useful that it warrants a new regex method.
msg116764 - (view) Author: Tom Lynn (tlynn) Date: 2010-09-18 11:42
I'm still unsure.  I think this confusion does cause bugs in real-world code.  Perhaps more prominence for \A and \Z in the docs?  There's already a section comparing regexps starting '^' with match under "Matching vs Searching".

The problem is basically that ^ and $ have weird semantics but are better recognised than \A and \Z.  Looking over the docs again I see that the docs for $ are still misleading, in a way that's related to this issue:

    foo matches both 'foo' and 'foobar', while the regular
    expression foo$ matches only 'foo'.

"foo$ matches only 'foo' (out of 'foo' and 'foobar')" is the correct interpretation of that, but it's easy to read it as "foo$ means exact_match('foo')", which is the misconception I was hoping to put to rest with this (foo$ also matches the 'foo' part of 'foo\nbar', even with flags=0).
msg116765 - (view) Author: Tom Lynn (tlynn) Date: 2010-09-18 11:57
Actually, looking at the second part of the docs for $ (on "foo.$") makes me think the main motivating case here may be a bug in re.match::

    >>> re.match('foo$', 'foo\n\n')
    >>> re.match('foo$', 'foo\n')
    <_sre.SRE_Match object at 0x00A98678>

Shortening an input string shouldn't ever cause it to match, should it?
msg116771 - (view) Author: Tom Lynn (tlynn) Date: 2010-09-18 12:51
Oh dear, I'm wrong on two fronts (I wish Roundup had post editing).

a) foo$ doesn't match the 'foo' part of 'foo\nbar' as I stated above, but does match the 'foo' part of 'foo\n'.
b) Obviously shortening an input string can cause it to match.  It's still weird though.
msg116833 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-09-18 22:45
Can we close this one?
msg116837 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2010-09-18 23:03
'$' matches at the end of the string or at a newline at the end of a string (if multiline mode isn't turned on). '\Z' matches only at the end of the string.

If not even the OP is convinced of the need, then I have no objection to closing.
msg116843 - (view) Author: Tom Lynn (tlynn) Date: 2010-09-19 00:12
(Sorry to comment on a closed issue, it was closed as I was writing this.)  It's not that I'm not convinced of the need, just not of the solution.  I still think that there are problems here:

a) forgetting any \Z or $ terminator to .match() is easy,
b) $ is easily misunderstood (and not just by me) and I suspect commonly dangerously misused in validation routines as a result,
c) '(?:%s)\Z' % regexp is noisy, combines two less-understood features, and makes simple regexps hard to read,
d) '(?:%s)\Z' % regexp.pattern requires recompilation of the regexp.

I think another method is probably the best solution to these, but it may have too much cost (though I'm not sure what that cost would be).

Largely orthogonally, I'd like to see \Z encouraged over $ in the docs, and preferably a version of this table (probably under Matching vs Searching), corrected if I'm wrong of course:

    '^' is equivalent to '\A'
    '$' is equivalent to '(?:\Z|(?=\n\Z))'

    '^' is equivalent to '(?:\A|(?<=\n))'
    '$' is equivalent to '(?:\Z|(?=\n))'

But the docs already try to express the above table (or its correction) in English, so you may feel it wouldn't add anything, in which case I'd still like to see any corrections for my own edification if possible.
msg230856 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-08 13:26
Was implemented as fullmatch() in issue16203.
Date User Action Args
2014-11-08 13:26:22serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg230856
resolution: rejected -> duplicate

superseder: Proposal: add re.fullmatch() method
2010-09-19 00:13:00tlynnsetmessages: + msg116843
2010-09-18 23:34:27rhettingersetstatus: open -> closed
resolution: rejected
2010-09-18 23:03:54mrabarnettsetmessages: + msg116837
2010-09-18 22:45:43rhettingersetnosy: + rhettinger
messages: + msg116833
2010-09-18 12:51:35tlynnsetmessages: + msg116771
2010-09-18 11:57:25tlynnsetmessages: + msg116765
2010-09-18 11:42:17tlynnsetmessages: + msg116764
2010-09-18 06:34:46georg.brandlsetnosy: + georg.brandl
messages: + msg116755
2010-09-17 21:48:39tlynnsetmessages: + msg116724
2010-09-17 17:27:22r.david.murraysetassignee: niemeyer ->

messages: + msg116688
nosy: + r.david.murray
2010-09-17 16:08:11mrabarnettsetmessages: + msg116676
2010-09-17 15:49:01BreamoreBoysetnosy: + mrabarnett
2010-08-21 23:23:23georg.brandlsetversions: + Python 3.2, - Python 2.7
2008-10-13 14:46:41tlynnsetmessages: + msg74688
2008-10-13 13:31:13timehorsesetmessages: + msg74685
2008-09-28 19:31:24timehorsesetnosy: + timehorse
versions: + Python 2.7
2007-04-27 11:35:43tlynncreate