This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Proposal: re.prefixmatch method (alias for re.match)
Type: enhancement Stage: patch review
Components: Library (Lib), Regular Expressions Versions: Python 3.11
Status: open Resolution:
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: ezio.melotti, gregory.p.smith, matthew.suozzo, mrabarnett, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2020-11-13 19:27 by gregory.p.smith, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 31137 open gregory.p.smith, 2022-02-05 02:12
Messages (6)
msg380928 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-11-13 19:27
A well known anti-pattern in Python is use of re.match when you meant to use

re.fullmatch was added in 3.4 via for similar reasons.

re.prefixmatch would be similar: we want the re.match behavior, but want the code to be obvious about its intent.  This documents the implicit ^ in the name.

The goal would be to allow linters to ultimately flag re.match as the anti-pattern when in 3.10+ mode.  Asking people to use re.prefixmatch or instead.

This would help avoid bugs where people mean but write re.match.

The implementation is trivial.

This is **not** a decision to deprecate the widely used in 25 years worth of code's re.match name.  That'd be painful and is unlikely to be worth doing without spreading it over a 8+ year timeline.
msg380975 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-11-14 12:02
I seen a code which uses with anchor ^ instead of re.match(), but I never seen a code which uses re.match() instead of It just won't work unless you add explicit ".*" or ".*?" at the start of the pattern, and it is a clear indication that re.match() matches the start of the string.
msg380984 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-11-14 18:22
My point is that re.match is a common bug when people really want

re.prefixmatch makes it explicit and non-confusing and thus unlikely to be used wrong or misunderstood when read or reviewed.

The term "match" when talking about regular expressions is not normally meant to imply any anchoring as anchors can be expressed within the regex.  Python is relatively unique in bothering to have different methods for a prefix match and an anywhere match.  (We'd have been better off without a match method entirely, only having search - too late now)
msg380996 - (view) Author: Matthew Suozzo (matthew.suozzo) * Date: 2020-11-15 00:27
> It just won't work unless you add explicit ".*" or ".*?" at the start of the pattern

But think of when regexes are used for validating input. Getting it to "just work" may be over-permissive validation that only actually checks the beginning of the input. They're one missed test case away from a crash or, worse, a security issue.

This proposed name change would help make the function behavior obvious at the callsite. In the validator example, calling "prefixmatch" would stand out as wrong to even the most oblivious, documentation-averse user.

> My point is that re.match is a common bug when people really want

While I think better distinguishing the interfaces is a nice-to-have for usability, I think it has more absolute benefit to correctness. Again, confusion around the semantics of "match" were the motivation for adding "fullmatch" in the first place but that change only went so far to address the problem: It's still too easy to misuse the existing "match" interface and it's not realistic to remove it from the language. A new name would eliminate this class of error at a very low cost.
msg412554 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2022-02-05 02:36
What do other APIs in widely used languages do with regex terminology?  We appear to be the only popular language who anchors to the start of a string with an API named "match".

libpcre C: uses "match" to mean what we call "search" -

Go: Uses "Match" to mean what we call "search" -

JavaScript: Uses "match" to mean what we call "search" -

Java: Uses "matches" (I think meaning what we call fullmatch?) -

C++ RE2: explicit "FullMatch" and "PartialMatch" APIs - 

Jave re2j: uses "matches" like Java regex.Pattern - 

Ruby: Uses "match" as we do "search" -

Rust: Uses match as we do "search" -
msg412564 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-02-05 08:57
I am not convinced. What are examples of using re.match() instead of How common is this type of errors?

There are perhaps many millions of scripts which use re.match(), deprecating re.match() at any time in future would be very destructive, and keeping an alias indefinitely would only add more confusion.
Date User Action Args
2022-04-11 14:59:38adminsetgithub: 86519
2022-02-05 08:57:42serhiy.storchakasetmessages: + msg412564
2022-02-05 02:36:29gregory.p.smithsetmessages: + msg412554
2022-02-05 02:12:56gregory.p.smithsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request29314
2022-02-04 23:13:35gregory.p.smithsetassignee: gregory.p.smith
versions: + Python 3.11, - Python 3.10
2020-11-15 00:27:33matthew.suozzosetnosy: + matthew.suozzo
messages: + msg380996
2020-11-14 18:22:46gregory.p.smithsetmessages: + msg380984
2020-11-14 12:02:36serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg380975
2020-11-14 09:16:16xtreaksetnosy: + ezio.melotti, mrabarnett
components: + Regular Expressions
2020-11-13 19:27:03gregory.p.smithcreate