This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Proposal: re.prefixmatch method (alias for re.match)
Type: enhancement Stage: patch review
Components: Library (Lib), Regular Expressions Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: ezio.melotti, gregory.p.smith, matthew.suozzo, mrabarnett, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2020-11-13 19:27 by gregory.p.smith, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 31137 open gregory.p.smith, 2022-02-05 02:12
Messages (6)
msg380928 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-11-13 19:27
A well known anti-pattern in Python is use of re.match when you meant to use re.search.

re.fullmatch was added in 3.4 via https://bugs.python.org/issue16203 for similar reasons.

re.prefixmatch would be similar: we want the re.match behavior, but want the code to be obvious about its intent.  This documents the implicit ^ in the name.

The goal would be to allow linters to ultimately flag re.match as the anti-pattern when in 3.10+ mode.  Asking people to use re.prefixmatch or re.search instead.

This would help avoid bugs where people mean re.search but write re.match.

The implementation is trivial.

This is **not** a decision to deprecate the widely used in 25 years worth of code's re.match name.  That'd be painful and is unlikely to be worth doing without spreading it over a 8+ year timeline.
msg380975 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-11-14 12:02
I seen a code which uses re.search() with anchor ^ instead of re.match(), but I never seen a code which uses re.match() instead of re.search(). It just won't work unless you add explicit ".*" or ".*?" at the start of the pattern, and it is a clear indication that re.match() matches the start of the string.
msg380984 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-11-14 18:22
My point is that re.match is a common bug when people really want re.search.

re.prefixmatch makes it explicit and non-confusing and thus unlikely to be used wrong or misunderstood when read or reviewed.

The term "match" when talking about regular expressions is not normally meant to imply any anchoring as anchors can be expressed within the regex.  Python is relatively unique in bothering to have different methods for a prefix match and an anywhere match.  (We'd have been better off without a match method entirely, only having search - too late now)
msg380996 - (view) Author: Matthew Suozzo (matthew.suozzo) * Date: 2020-11-15 00:27
> It just won't work unless you add explicit ".*" or ".*?" at the start of the pattern

But think of when regexes are used for validating input. Getting it to "just work" may be over-permissive validation that only actually checks the beginning of the input. They're one missed test case away from a crash or, worse, a security issue.

This proposed name change would help make the function behavior obvious at the callsite. In the validator example, calling "prefixmatch" would stand out as wrong to even the most oblivious, documentation-averse user.

> My point is that re.match is a common bug when people really want re.search.

While I think better distinguishing the interfaces is a nice-to-have for usability, I think it has more absolute benefit to correctness. Again, confusion around the semantics of "match" were the motivation for adding "fullmatch" in the first place but that change only went so far to address the problem: It's still too easy to misuse the existing "match" interface and it's not realistic to remove it from the language. A new name would eliminate this class of error at a very low cost.
msg412554 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2022-02-05 02:36
What do other APIs in widely used languages do with regex terminology?  We appear to be the only popular language who anchors to the start of a string with an API named "match".

libpcre C: uses "match" to mean what we call "search" - https://www.pcre.org/current/doc/html/pcre2_match.html

Go: Uses "Match" to mean what we call "search" - https://pkg.go.dev/regexp#Match

JavaScript: Uses "match" to mean what we call "search" - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match

Java: Uses "matches" (I think meaning what we call fullmatch?) - https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

C++ RE2: explicit "FullMatch" and "PartialMatch" APIs - https://github.com/google/re2 

Jave re2j: uses "matches" like Java regex.Pattern - https://github.com/google/re2j 

Ruby: Uses "match" as we do "search" - https://ruby-doc.org/core-2.4.0/Regexp.html

Rust: Uses match as we do "search" - https://docs.rs/regex/latest/regex/
msg412564 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-02-05 08:57
I am not convinced. What are examples of using re.match() instead of re.search()? How common is this type of errors?

There are perhaps many millions of scripts which use re.match(), deprecating re.match() at any time in future would be very destructive, and keeping an alias indefinitely would only add more confusion.
History
Date User Action Args
2022-04-11 14:59:38adminsetgithub: 86519
2022-02-05 08:57:42serhiy.storchakasetmessages: + msg412564
2022-02-05 02:36:29gregory.p.smithsetmessages: + msg412554
2022-02-05 02:12:56gregory.p.smithsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request29314
2022-02-04 23:13:35gregory.p.smithsetassignee: gregory.p.smith
versions: + Python 3.11, - Python 3.10
2020-11-15 00:27:33matthew.suozzosetnosy: + matthew.suozzo
messages: + msg380996
2020-11-14 18:22:46gregory.p.smithsetmessages: + msg380984
2020-11-14 12:02:36serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg380975
2020-11-14 09:16:16xtreaksetnosy: + ezio.melotti, mrabarnett
components: + Regular Expressions
2020-11-13 19:27:03gregory.p.smithcreate