classification
Title: Proposal: re.prefixmatch method (alias for re.match)
Type: enhancement Stage: needs patch
Components: Library (Lib), Regular Expressions Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, gregory.p.smith, matthew.suozzo, mrabarnett, serhiy.storchaka
Priority: normal Keywords:

Created on 2020-11-13 19:27 by gregory.p.smith, last changed 2020-11-15 00:27 by matthew.suozzo.

Messages (4)
msg380928 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-11-13 19:27
A well known anti-pattern in Python is use of re.match when you meant to use re.search.

re.fullmatch was added in 3.4 via https://bugs.python.org/issue16203 for similar reasons.

re.prefixmatch would be similar: we want the re.match behavior, but want the code to be obvious about its intent.  This documents the implicit ^ in the name.

The goal would be to allow linters to ultimately flag re.match as the anti-pattern when in 3.10+ mode.  Asking people to use re.prefixmatch or re.search instead.

This would help avoid bugs where people mean re.search but write re.match.

The implementation is trivial.

This is **not** a decision to deprecate the widely used in 25 years worth of code's re.match name.  That'd be painful and is unlikely to be worth doing without spreading it over a 8+ year timeline.
msg380975 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-11-14 12:02
I seen a code which uses re.search() with anchor ^ instead of re.match(), but I never seen a code which uses re.match() instead of re.search(). It just won't work unless you add explicit ".*" or ".*?" at the start of the pattern, and it is a clear indication that re.match() matches the start of the string.
msg380984 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-11-14 18:22
My point is that re.match is a common bug when people really want re.search.

re.prefixmatch makes it explicit and non-confusing and thus unlikely to be used wrong or misunderstood when read or reviewed.

The term "match" when talking about regular expressions is not normally meant to imply any anchoring as anchors can be expressed within the regex.  Python is relatively unique in bothering to have different methods for a prefix match and an anywhere match.  (We'd have been better off without a match method entirely, only having search - too late now)
msg380996 - (view) Author: Matthew Suozzo (matthew.suozzo) * Date: 2020-11-15 00:27
> It just won't work unless you add explicit ".*" or ".*?" at the start of the pattern

But think of when regexes are used for validating input. Getting it to "just work" may be over-permissive validation that only actually checks the beginning of the input. They're one missed test case away from a crash or, worse, a security issue.

This proposed name change would help make the function behavior obvious at the callsite. In the validator example, calling "prefixmatch" would stand out as wrong to even the most oblivious, documentation-averse user.

> My point is that re.match is a common bug when people really want re.search.

While I think better distinguishing the interfaces is a nice-to-have for usability, I think it has more absolute benefit to correctness. Again, confusion around the semantics of "match" were the motivation for adding "fullmatch" in the first place but that change only went so far to address the problem: It's still too easy to misuse the existing "match" interface and it's not realistic to remove it from the language. A new name would eliminate this class of error at a very low cost.
History
Date User Action Args
2020-11-15 00:27:33matthew.suozzosetnosy: + matthew.suozzo
messages: + msg380996
2020-11-14 18:22:46gregory.p.smithsetmessages: + msg380984
2020-11-14 12:02:36serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg380975
2020-11-14 09:16:16xtreaksetnosy: + ezio.melotti, mrabarnett
components: + Regular Expressions
2020-11-13 19:27:03gregory.p.smithcreate