classification
Title: Defer compiling regular expressions
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: barry Nosy List: barry, ezio.melotti, r.david.murray, rhettinger, scoder, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-09-26 02:10 by barry, last changed 2017-09-27 14:23 by barry. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 3755 closed barry, 2017-09-26 02:14
Messages (12)
msg302995 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2017-09-26 02:10
It's a very common pattern to see the following at module scope:

cre_a = re.compile('some pattern')
cre_b = re.compile('other pattern')

and so on.  This can cost you at start up time because all those regular expressions are compiled at import time, even if they're never used in practice (e.g. because say whatever condition tickles the compiled regex never gets exercised).

It occurred to me that if re.compile() deferred compilation of the regexp until first use, you could speed up start up time.  But by how much?  And at what cost?

So I ran a small experiment (pull request to be submitted) using the `perf` module on `pip --help`.  I was able to cut down the number of compiles from 28 to 9, and a mean startup time from 245ms to 213ms.

% python -m perf compare_to ../base.json ../defer.json 
Mean +- std dev: [base] 245 ms +- 19 ms -> [defer] 213 ms +- 21 ms: 1.15x faster (-13%)

`pip install tox` reduces the compiles from 231 to 75:

(cpython 3.7) 231 0.06945133209228516
(3.7 w/defer)  75 0.03140091896057129

So what's the cost?  Backward compatibility.  `re.compile()` doesn't return a compiled regular expression object now, but instead a "deferred" proxy.  When the proxy is used, then it does the actual compilation.  This can break compatibility by deferring any exceptions that compile() might raise.  This happens a fair bit in the test suite, but I'm not sure it's all that common in practice.  In any case, I've also added a re.IMMEDIATE (re.N -- for "now") flag to force immediate compilation.

I also modified the compilation to use an actual functools.lru_cache.  This way, if maxcache gets triggered, the entire cache won't get blown away.

So, whether this is a good idea or not, I open this and push the branch for further discussion.
msg302996 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-09-26 02:39
ISTM, the whole point is to compile in advance.  When I worked during high frequency trading, that was essential to news trading where you *really* didn't want to pay the compilation cost at the time the regex was used.  This proposal takes away the user's only control over when the regex is compiled. 

FWIW, if a user doesn't explicitly invoke re.compile() and instead uses a straight call to re.search(pattern, s), then the pattern is compiled on first-use and cached for future use.  In other words, we already have a simple and clear way to auto-compile on first use.  I recommend against taking away the only option to specify otherwise.
msg302997 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-09-26 02:56
I agree with Raymond.  It would be strange to have the API that is obviously designed to pre-compile the regex not pre-compile the regex.

If the concern is that a non-precompiled regex might get bumped out of the cache but you want a way to only compile a "bunch of stuff" on demand, then the thing to do would be to make a proposal for a new API that does that, and we'll discuss the merits :)

It seems to me that contexts that care about startup time can use the deferred compile API, whether the new (it it is accepted) or existing one.  If an application cares about startup time and is using re.compile at startup, then that is an application performance bug.
msg303003 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-26 05:07
lru_cache() is not used for reasons. The patch makes using cached patterns slower by adding an overhead of 4 additional checks. It makes resolving attributes of a deferred patterns slower. All this slows down a straight call to re.search(pattern, s).

Errors in regular expression now raised only on first use of the pattern. This is a drawback from educational and debugging points of view.

The patch also breaks warnings emitted during compiling regular expressions. Now they report wrong source line and use wrong line for caching in case of -Wonce.
msg303042 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2017-09-26 14:34
Let's separate the use of lru_cache from the deferred compilation.  I think I'll just revert the change to use lru_cache, although I'll note that the impetus for this was the observation that once MAXCACHE is reached the entire regexp cache is purged.  That seems suboptimal and my guess is that it was done because the current cache is just a dictionary so there's no good way to partially purge it.  My thought there was maybe to use an OrderedDictionary, but I'm not sure the complexity is worth it.  We can evaluate that separately.

I'm not sure RDB and Raymond noticed the addition of the re.IMMEDIATE flag.  That's exactly the way you would say "compile this regexp right now", so that option is absolutely not taken away!  My claim is that most regexps at module scope do *not* need to be compiled at import time, and doing so is a waste of resources.  For cases where you really need it, you have it.

I did notice the warnings problem and mostly glossed over it, but for this patch to become "real" we'd have to try to restore that.

The other thought I had was this: if we can observe that most module scope re.compiles() are essentially constants, then maybe the compiler/peephole/PEP511 can help here.  It could recognize constant arguments to re.compile() and precompile them.
msg303046 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-09-26 15:27
Precompiling as a compile-time optimization would be cool.  I think we are currently favoring doing that kind of thing as an AST optimization step?

I think Raymond and my point was that the current behavior should remain unchanged by default.  So a re.DEFERRED flag whose default is False would be a possible candidate for the "new API" I suggested :)

However, compile time optimization could default to on like our peephole optimizations do[*], I don't think anyone would be upset about that.

[*] There are complaints that you can't turn them *off*, but that's a separate issue :)
msg303047 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2017-09-26 16:01
On Sep 26, 2017, at 11:27, R. David Murray <report@bugs.python.org> wrote:
> 
> Precompiling as a compile-time optimization would be cool.  I think we are currently favoring doing that kind of thing as an AST optimization step?

I was thinking about that too.

> I think Raymond and my point was that the current behavior should remain unchanged by default.  So a re.DEFERRED flag whose default is False would be a possible candidate for the "new API" I suggested :)

Yep, it’s definitely a behavior change, so its the classic “add a new api and most projects won’t get the benefits until they opt-in” problem.  But the problem with that of course is that there are module global re.compiles *everywhere* including probably most commonly in libraries X dependencies deep, which you can’t opt into even if you know it’s safe.  There are probably ways to deal with that, but it all adds complexity.  So adding a deferred compilation API may be valuable, but only in the very long run.

The backward compatibility issue may indeed be the fatal flaw here.  I’m not sure whether the far future payoff is worth adding the new API.  Maybe, but let’s see how far we can get in the meantime.

> However, compile time optimization could default to on like our peephole optimizations do[*], I don't think anyone would be upset about that.
> 
> [*] There are complaints that you can't turn them *off*, but that's a separate issue :)

Yep!  I think I’m going to see if I can make any interesting progress on that, and we’ll just leave this issue open as a placeholder in the meantime.
msg303050 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-09-26 16:37
I'm flat-out opposed to changing the default behavior.  If some API change gets make it needs to strictly be opt-in.  

Even then, I don't think this is a great idea.  We already have ways to do it if people actually cared about this.

FWIW, other languages also have regex compile() and it always means compile immediate.  The proposed change alters the semantics of existing code and undermines the explicit decisions of people who wrote existing packages.  It is at odds with what re.compile() means everywhere else.

Having an opt-out IMMEDIATE flag makes it difficult for package writers to maintain code across versions.  Also flags further complicate an API that is already challenging.
msg303061 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-26 18:42
Barry, please test re.search() with your changes.

$ ./python -m timeit -s 'import re; s = "a" * 100' 're.search("b", s)'
Unpatched:  500000 loops, best of 5: 529 nsec per loop
Patched:    50000 loops, best of 5: 7.46 usec per loop
msg303064 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2017-09-26 19:01
I'm also against changing re.compile() to not compile.

And I often write code like this:

    replace_whitespace = re.compile(r"\s+").sub

which is not covered by your current proposed change.
msg303069 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2017-09-26 19:52
What about adding a lazy_compile() function?  It will leave the current behavior unchanged, it's explicit, and it's easier to use cross version (if importing re.lazy_compile fails, use re.compile).

FWIW I'm -1 on changing re.compile, -1 on adding re.IMMEDIATE, -0.5 on adding re.DEFERRED (not sure this option belongs among the re flag), +1 on adding a compile-time optimization.
msg303151 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2017-09-27 14:23
I'm closing this experiment.

I'm not convinced that even if we can make start up time faster for module global regular expressions, we'll ever get enough buy-in from the ecosystem to make this worth it, because you'd really want to get all your dependencies converted, and that would take years, especially since any wins wouldn't be possible until Python 3.7.
History
Date User Action Args
2017-09-27 14:23:18barrysetstatus: open -> closed
resolution: wont fix
messages: + msg303151

stage: patch review -> resolved
2017-09-26 19:52:17ezio.melottisetnosy: + ezio.melotti
messages: + msg303069
2017-09-26 19:01:24scodersetnosy: + scoder
messages: + msg303064
2017-09-26 18:42:25serhiy.storchakasetmessages: + msg303061
2017-09-26 16:37:02rhettingersetmessages: + msg303050
2017-09-26 16:01:54barrysetmessages: + msg303047
2017-09-26 15:27:37r.david.murraysetmessages: + msg303046
2017-09-26 14:34:30barrysetmessages: + msg303042
2017-09-26 05:07:04serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg303003
2017-09-26 02:56:39r.david.murraysetnosy: + r.david.murray
messages: + msg302997
2017-09-26 02:39:48rhettingersetnosy: + rhettinger
messages: + msg302996
2017-09-26 02:14:17barrysetkeywords: + patch
stage: patch review
pull_requests: + pull_request3742
2017-09-26 02:10:54barrycreate