classification
Title: IntFlag makes re.compile slower
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: barry, ethan.furman, gvanrossum, haypo, inada.naoki, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-10-03 11:00 by inada.naoki, last changed 2017-10-05 10:18 by haypo. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 3862 merged inada.naoki, 2017-10-03 11:05
PR 3867 closed haypo, 2017-10-03 12:48
Messages (29)
msg303594 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-10-03 11:00
flags exposed by re module are IntFlag from Python 3.4.

Since it is passed to underlying sre_parse and sre_compile,
tests in loop (e.g. `flags & SRE_FLAG_IGNORECASE`) calls
IntFlag.__and__ and creates new instance everytime.

It makes re.compile() slower.
msg303601 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-03 12:17
See also issue28637. Using IntFlag in the re module impacted the Python startup time. This was "fixed" by getting rid of re in site.py.
msg303603 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-10-03 12:27
Oh, Python already accepts floating point numbers:

haypo@selma$ ./python
Python 3.7.0a1+ (heads/master-dirty:efb560eee2, Oct  3 2017, 12:15:58) 
[GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.compile("[a-z]", flags=re.I).match("A")
<_sre.SRE_Match object; span=(0, 1), match='A'>
>>> re.I
<RegexFlag.IGNORECASE: 2>
>>> re.compile("[a-z]", flags=2).match("A")
<_sre.SRE_Match object; span=(0, 1), match='A'>

# and now a float
>>> re.compile("[a-z]", flags=2.0).match("A")
<_sre.SRE_Match object; span=(0, 1), match='A'>
msg303605 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-03 12:37
:) This is due to caching. 2.0 == 2.
msg303607 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-10-03 12:40
Oh, right. Strange.

>>> re.compile("e", flags=2.0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/haypo/prog/python/master/Lib/re.py", line 240, in compile
    return _compile(pattern, flags)
  File "/home/haypo/prog/python/master/Lib/re.py", line 288, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/home/haypo/prog/python/master/Lib/sre_compile.py", line 747, in compile
    p = sre_parse.parse(p, flags)
  File "/home/haypo/prog/python/master/Lib/sre_parse.py", line 890, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
TypeError: unsupported operand type(s) for &: 'float' and 'int'
msg303609 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-10-03 12:50
> :) This is due to caching. 2.0 == 2.

I proposed PR 3867 to fix the re.compile() cache: check flags type.
msg303610 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-03 12:52
Victor, how large is performance regression of your patch?
msg303617 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-10-03 13:51
Serhiy: "Victor, how large is performance regression of your patch?"

I tested bm_regex_compile.py of the Python performance benchmark suite:
https://pyperformance.readthedocs.io/benchmarks.html#regex-compile

My patch has no impact on *uncached* re.compile() performances according to this benchmark.

haypo@selma$ ./python -m perf compare_to ref.json patch.json  
Benchmark hidden because not significant (1): regex_compile

haypo@selma$ ./python -m perf compare_to ref.json patch.json  -v
Mean +- std dev: [ref] 386 ms +- 6 ms -> [patch] 387 ms +- 6 ms: 1.00x slower (+0%)
Not significant!

--

On a microbenchmark on the *cache* itself, the difference is visible:

 1018  ./python -m perf timeit -s 'import re; re_compile=re.compile' 're_compile("a", flags=2)' --duplicate=1024 -o ref.json  --inherit=PYTHONPATH -v
 1019  git co re_compile_flags_type 
 1020  make
 1021  ./python -m perf timeit -s 'import re; re_compile=re.compile' 're_compile("a", flags=2)' --duplicate=1024 -o patch.json  --inherit=PYTHONPATH -v
 1022  ./python -m perf compare_to ref.json patch.json  
 1023  git diff master..

haypo@selma$ ./python -m perf compare_to ref.json patch.json  
Mean +- std dev: [ref] 364 ns +- 6 ns -> [patch] 459 ns +- 10 ns: 1.26x slower (+26%)

If you replace type(flags) with flags.__class__, the change has no impact on performances :-) But obj.__class__ can be different than type(obj).
msg303626 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-03 16:24
I don't think there is a problem which is worth to be solved by PR 3867. It is very unlikely that anyone uses re functions with a pattern and float flags which accidentally matches already cached valid pattern and integer flag. If float value is passed as a flag it is likely is passed the first time. Fractional floats are rejected in any case. This is different from the problem with PR 3862.
msg303666 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-10-04 01:58
When proposing to undo recent decisions, please add the people to the nosy list who were involved in making that decision in the first place.
msg303667 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-10-04 02:03
> When proposing to undo recent decisions, please add the people to the nosy list who were involved in making that decision in the first place.

I don't propose reverting IntFlag.
I just propose convert IntFlag to int in re module, before passing it to sre_compile.
So this change is not visible to re users.
msg303679 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-10-04 10:06
Nice speedup!

I ran a benchmark on PR 3862:

 1002  ./python ~/prog/python/performance/performance/benchmarks/bm_regex_compile.py --inherit=PYTHONPATH -v -o patch.json
 1003  git co master
 1004  make
 1005  ./python ~/prog/python/performance/performance/benchmarks/bm_regex_compile.py --inherit=PYTHONPATH -v -o ref.json
 1007  ./python -m perf compare_to ref.json patch.json 

Mean +- std dev: [ref] 396 ms +- 16 ms -> [patch] 347 ms +- 8 ms: 1.14x faster (-12%)

It's the following benchmark on *uncached* regular expressions. So it really measures re.compile() performance, not the cache.
https://pyperformance.readthedocs.io/benchmarks.html#regex-compile
msg303681 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-10-04 12:33
Thank you for benchmarking.  I've added news entry about it.
msg303684 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-10-04 13:11
Naoki: "I've added news entry about it."

Nice. You might also mention the "optimization" in What's New in Python 3.7 doc, in the Optimisations section. The issue here is that it's only faster than Python 3.6, but Python 3.6 was slower than 3.5 :-) The "optimization" just makes Python 3.7 as fast as Python 3.5 ... :-)
msg303687 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-04 13:31
Did you compare the benchmarking results against 3.7 or 3.6?
msg303688 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-10-04 13:34
> Did you compare the benchmarking results against 3.7 or 3.6?

My lasted benchmark is on the master branch: original code ("ref") vs code patched with PR 3862 ("patch").
msg303690 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-04 13:47
It makes sense to report the performance gain only in comparison with the previous released version. I expect that re compiling is slower in 3.7 due to new features and optimizations. Generating more optimal code takes a time.
msg303691 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-10-04 13:50
I don't really care if Python 3.7 is still slower than 3.6 with PR 3862. The speedup is significant, the change is short and safe, so the PR LGTM :-) We can implement further optimizations later ;-)

By the way, speed.python.org can be used to track performances over time. Sadly, I didn't have much time to take care of the website: run manually to job to draw new plots :-)
msg303706 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2017-10-04 15:52
IntFlag.__and__ does not create a new instance every time -- all new instances are cached in the IntFlag machinery (so RegexFlag(7) is only created once).

If all the RegexFlag combinations are created before the regex compile benchmark do we still see a speed-up?
msg303732 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-10-04 23:12
> IntFlag.__and__ does not create a new instance every time -- all new instances are cached in the IntFlag machinery (so RegexFlag(7) is only created once).

I'm sorry, I misunderstood.
But while new instance is not created each time, 4 Python method calls
(e,g.  IntFlag.__and__() -> IntFlag.__new__() -> IntFlag._missing_() -> IntFlag._create_pseudo_member_()) are much slower than int & int.

> If all the RegexFlag combinations are created before the regex compile benchmark do we still see a speed-up?

I believe that's what Victor benchmarked.
msg303737 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-10-05 01:06
The perf module always starts with a "warmup" run to fill caches. If enum
has a cache, it should be filled automatically.
msg303740 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2017-10-05 04:44
INADA Naoki said:
> But while new instance is not created each time, 4 Python method
> calls (e,g.  IntFlag.__and__() -> IntFlag.__new__()
> -> IntFlag._missing_() -> IntFlag._create_pseudo_member_())
> are much slower than int & int.

Only the first two calls always happen -- the last two calls only happen for numbers that haven't been seen yet.  And yes, two Python level calls are much slower than an int & int.
msg303746 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-10-05 08:19
New changeset c1c47c166b1012d34f2c6e111ee9ccb5c4d12de7 by INADA Naoki in branch 'master':
bpo-31671: re: Convert RegexFlag to int before compile (GH-3862)
https://github.com/python/cpython/commit/c1c47c166b1012d34f2c6e111ee9ccb5c4d12de7
msg303747 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-05 08:34
Thanks Naoki!
msg303749 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-10-05 09:29
My PR 3867 fixes a corner case when the re.compile() is misused ("on purpose"?). I'm going to reject (abandon) my own PR since it makes re.compile() slower:

Mean +- std dev: [ref] 364 ns +- 6 ns -> [patch] 459 ns +- 10 ns: 1.26x slower (+26%)

@Serhiy, @Naoki: Are you ok to reject this PR and keep the corner case bug?
msg303750 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-10-05 09:34
Instead caching type(flags), how about this?

  if not isinstance(flags, int):
      raise TypeError(f"flags must be int or RegexFlag, got {flags!r}")
  flags = int(flags)
msg303753 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-05 10:08
PR 3867 looks unpythonic to me. We usually don't check the type of arguments. This complicate and slow down a code.

Do you have a realistic example when the current behavior harms?
msg303754 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-10-05 10:10
Hum, I ran again my microbenchmark on re.compile() cache:

haypo@selma$ ./python -m perf timeit -s 'import re; re_compile=re.compile' 're_compile("a", flags=2)' --duplicate=1024 -o ref.json  --inherit=PYTHONPATH -v

Sadly, the commit c1c47c166b1012d34f2c6e111ee9ccb5c4d12de7 made the cache slower:

Mean +- std dev: [ref] 364 ns +- 26 ns -> [patch] 545 ns +- 19 ns: 1.50x slower (+50%)

Just to check, I reverted the change on Scanner, the benchmark is still "560 ns +- 27 ns".


"if isinstance(flags, RegexFlag): flags = flags.value" added 181 nanoseconds? A quick "isinstance(flags, RegexFlag)" timeit microbenchmark says that this operation has a cost of 46.2 ns (+- 1.6 ns). Benchmarks are strange, as usual :-)
msg303756 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-10-05 10:18
Serhiy: "PR 3867 looks unpythonic to me. We usually don't check the type of arguments. This complicate and slow down a code. Do you have a realistic example when the current behavior harms?"

No, I don't have any realistic example. I just noticed the bug and I was surprised. In practice, flags=2.0 instead of flags=2 is not totally wrong, it's okish to accept 2.0. Let's just forget this corner case. I abandon my change.
History
Date User Action Args
2017-10-05 10:18:23hayposetmessages: + msg303756
2017-10-05 10:10:57hayposetmessages: + msg303754
2017-10-05 10:08:57serhiy.storchakasetmessages: + msg303753
2017-10-05 09:34:15inada.naokisetmessages: + msg303750
2017-10-05 09:29:44hayposetmessages: + msg303749
2017-10-05 08:34:51serhiy.storchakasettype: performance
2017-10-05 08:34:31serhiy.storchakasetmessages: + msg303747
2017-10-05 08:20:18inada.naokisetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-10-05 08:19:32inada.naokisetmessages: + msg303746
2017-10-05 04:44:41ethan.furmansetmessages: + msg303740
2017-10-05 01:06:36hayposetmessages: + msg303737
2017-10-04 23:12:30inada.naokisetmessages: + msg303732
2017-10-04 15:52:33ethan.furmansetmessages: + msg303706
2017-10-04 13:50:59hayposetmessages: + msg303691
2017-10-04 13:47:41serhiy.storchakasetmessages: + msg303690
2017-10-04 13:34:34hayposetmessages: + msg303688
2017-10-04 13:31:42serhiy.storchakasetmessages: + msg303687
2017-10-04 13:11:15hayposetmessages: + msg303684
2017-10-04 12:33:04inada.naokisetmessages: + msg303681
2017-10-04 10:06:26hayposetmessages: + msg303679
2017-10-04 02:03:20inada.naokisetmessages: + msg303667
2017-10-04 01:58:26rhettingersetnosy: + rhettinger, ethan.furman, gvanrossum
messages: + msg303666
2017-10-03 16:24:45serhiy.storchakasetmessages: + msg303626
2017-10-03 14:00:31barrysetnosy: + barry
2017-10-03 13:51:37hayposetmessages: + msg303617
2017-10-03 12:52:39serhiy.storchakasetmessages: + msg303610
2017-10-03 12:50:24hayposetmessages: + msg303609
2017-10-03 12:48:04hayposetpull_requests: + pull_request3847
2017-10-03 12:40:12hayposetmessages: + msg303607
2017-10-03 12:37:59serhiy.storchakasetmessages: + msg303605
2017-10-03 12:27:35hayposetmessages: + msg303603
2017-10-03 12:23:16hayposetnosy: + haypo
2017-10-03 12:17:55serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg303601
2017-10-03 11:05:28inada.naokisetkeywords: + patch
stage: patch review
pull_requests: + pull_request3842
2017-10-03 11:00:30inada.naokicreate