classification
Title: Do not cache re.compile
Type: enhancement Stage: patch review
Components: Library (Lib), Regular Expressions Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, christian.heimes, ezio.melotti, flox, gvanrossum, mrabarnett, neologix, pitrou, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013-03-16 21:08 by serhiy.storchaka, last changed 2013-10-28 23:14 by Arfrever.

Files
File name Uploaded Description Edit
re_compile_nocache.patch serhiy.storchaka, 2013-03-16 21:29 review
Messages (9)
msg184354 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-03-16 21:08
Ezio proposed in issue16389 to not cache re.compile. Caching of re.compile has no sense and only pollutes the cache.
msg184356 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-03-16 21:29
Here is a patch.
msg184360 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-03-16 22:48
I'm not sure I agree. I've seen plenty of code that called re.compile() over and over again -- or called it with a computed string that would have only a small number of possible values.
msg185048 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-23 13:56
I think we could happily call such code buggy or at least suboptimal. The docs don't even mention that re.compile() actually uses a cache.
msg185122 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-03-24 07:57
> The docs don't even mention that re.compile() actually uses a cache.

Actually it does:
"""
re.compile(pattern, flags=0)

Note The compiled versions of the most recent patterns passed to re.match(), re.search() or re.compile() are cached, so programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions.
"""

Now, I agree that it's definitely suboptimal...
msg201479 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-27 17:49
So what is a decision?
msg201522 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-28 10:34
> So what is a decision?

Is there a clear performance benefit in removing the caching?

If the problem is cache pollution perhaps re.compile can get a separate cache :-)
msg201524 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-10-28 11:01
I don't like the idea that you want to remove a feature without a deprecation period.
msg201526 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-28 11:50
I'm surprised, but perhaps performance benefit actually exists (check this on other computers).

### regex_effbot ###
Min: 0.333525 -> 0.325349: 1.03x faster
Avg: 0.342451 -> 0.331665: 1.03x faster
Significant (t=12.13)
Stddev: 0.00606 -> 0.00651: 1.0738x larger

However the main benefit is that non-cached constructor (re.compile is a constructor of regular expression object) is expected to be non-cached, while other module level functions can be cached. See for example struct.Struct. We can cache non-cached constructor explicitly (as in fnmatch), but when a constructor already cached, adding new cache can only add overhead.
History
Date User Action Args
2013-10-28 23:14:22Arfreversetnosy: + Arfrever
2013-10-28 11:50:13serhiy.storchakasetmessages: + msg201526
2013-10-28 11:01:35christian.heimessetnosy: + christian.heimes
messages: + msg201524
2013-10-28 10:34:18pitrousetmessages: + msg201522
2013-10-27 17:49:31serhiy.storchakasetmessages: + msg201479
2013-03-24 13:35:44floxsetnosy: + flox
2013-03-24 07:57:45neologixsetnosy: + neologix
messages: + msg185122
2013-03-23 13:56:05pitrousetmessages: + msg185048
2013-03-16 22:48:33gvanrossumsetnosy: + gvanrossum
messages: + msg184360
2013-03-16 21:29:18serhiy.storchakasetfiles: + re_compile_nocache.patch
keywords: + patch
messages: + msg184356

stage: needs patch -> patch review
2013-03-16 21:08:56serhiy.storchakacreate