This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Do not cache re.compile
Type: enhancement Stage: resolved
Components: Library (Lib), Regular Expressions Versions: Python 3.4
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, abcdenis, christian.heimes, ezio.melotti, flox, mrabarnett, neologix, pitrou, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013-03-16 21:08 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin. This issue is now closed.

File name Uploaded Description Edit
re_compile_nocache.patch serhiy.storchaka, 2013-03-16 21:29 review
Messages (12)
msg184354 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-03-16 21:08
Ezio proposed in issue16389 to not cache re.compile. Caching of re.compile has no sense and only pollutes the cache.
msg184356 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-03-16 21:29
Here is a patch.
msg184360 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-03-16 22:48
I'm not sure I agree. I've seen plenty of code that called re.compile() over and over again -- or called it with a computed string that would have only a small number of possible values.
msg185048 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-23 13:56
I think we could happily call such code buggy or at least suboptimal. The docs don't even mention that re.compile() actually uses a cache.
msg185122 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-03-24 07:57
> The docs don't even mention that re.compile() actually uses a cache.

Actually it does:
re.compile(pattern, flags=0)

Note The compiled versions of the most recent patterns passed to re.match(), or re.compile() are cached, so programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions.

Now, I agree that it's definitely suboptimal...
msg201479 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-27 17:49
So what is a decision?
msg201522 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-28 10:34
> So what is a decision?

Is there a clear performance benefit in removing the caching?

If the problem is cache pollution perhaps re.compile can get a separate cache :-)
msg201524 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-10-28 11:01
I don't like the idea that you want to remove a feature without a deprecation period.
msg201526 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-28 11:50
I'm surprised, but perhaps performance benefit actually exists (check this on other computers).

### regex_effbot ###
Min: 0.333525 -> 0.325349: 1.03x faster
Avg: 0.342451 -> 0.331665: 1.03x faster
Significant (t=12.13)
Stddev: 0.00606 -> 0.00651: 1.0738x larger

However the main benefit is that non-cached constructor (re.compile is a constructor of regular expression object) is expected to be non-cached, while other module level functions can be cached. See for example struct.Struct. We can cache non-cached constructor explicitly (as in fnmatch), but when a constructor already cached, adding new cache can only add overhead.
msg289195 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-03-08 02:19
Can this be closed.  Caching of regexes has been around for a very long time and I expect that a lot of code depends on it.  This should not be washed away without considerable discussion.
msg289197 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2017-03-08 02:46
If we were doing it today, maybe we wouldn't cache them, but, as you say, it's been like that for a long time. (The regex module also caches them, because the re module does.) Unless someone can demonstrate that it's a problem, I'd say just leave it as it is.
msg387568 - (view) Author: (abcdenis) Date: 2021-02-23 13:08
regex library now supports avoiding cache:
Date User Action Args
2022-04-11 14:57:42adminsetgithub: 61643
2021-02-23 13:08:07abcdenissetnosy: + abcdenis
messages: + msg387568
2017-03-08 07:21:27serhiy.storchakasetstatus: open -> closed
resolution: rejected
stage: patch review -> resolved
2017-03-08 02:46:50mrabarnettsetmessages: + msg289197
2017-03-08 02:19:28rhettingersetstatus: pending -> open
nosy: + rhettinger
messages: + msg289195

2017-03-07 15:47:54gvanrossumsetstatus: open -> pending
2017-03-07 15:47:31gvanrossumsetstatus: pending -> open
nosy: - gvanrossum
2017-03-07 15:42:00serhiy.storchakasetstatus: open -> pending
2013-10-28 23:14:22Arfreversetnosy: + Arfrever
2013-10-28 11:50:13serhiy.storchakasetmessages: + msg201526
2013-10-28 11:01:35christian.heimessetnosy: + christian.heimes
messages: + msg201524
2013-10-28 10:34:18pitrousetmessages: + msg201522
2013-10-27 17:49:31serhiy.storchakasetmessages: + msg201479
2013-03-24 13:35:44floxsetnosy: + flox
2013-03-24 07:57:45neologixsetnosy: + neologix
messages: + msg185122
2013-03-23 13:56:05pitrousetmessages: + msg185048
2013-03-16 22:48:33gvanrossumsetnosy: + gvanrossum
messages: + msg184360
2013-03-16 21:29:18serhiy.storchakasetfiles: + re_compile_nocache.patch
keywords: + patch
messages: + msg184356

stage: needs patch -> patch review
2013-03-16 21:08:56serhiy.storchakacreate