msg194669 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-08-08 13:10 |
Before PEP 393 the regex functions scanned an array of char or Py_UNICODE and character testing was cheap. After PEP 393 they checks a kind of an unicode string for every tested character and processing of unicode strings becomes slower. _sre.c already generates two sets of functions from one source -- for byte and unicode strings. The proposed patch uses same technique to generate three sets of functions -- for byte/UCS1, UCS2 and UCS4 strings. This simplifies the code (now it more similar to pre-PEP393 version) and makes characters testing faster.
Benchmark example:
Python 3.2:
$ python3.2 -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
1000 loops, best of 3: 613 usec per loop
$ python3.2 -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
1000 loops, best of 3: 232 usec per loop
$ python3.2 -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
1000 loops, best of 3: 217 usec per loop
Python 3.4.0a1+ unpatched:
$ ./python -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
1000 loops, best of 3: 485 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
1000 loops, best of 3: 790 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
1000 loops, best of 3: 1.09 msec per loop
Python 3.4.0a1+ patched:
$ ./python -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
1000 loops, best of 3: 250 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
1000 loops, best of 3: 250 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
1000 loops, best of 3: 256 usec per loop
I also propose for simplicity extract a template part of _sre.c to separated file (i.e. srelib.h) and get rid of recursion.
|
msg194683 - (view) |
Author: Matthew Barnett (mrabarnett) * |
Date: 2013-08-08 15:05 |
It appears that in your tests Python 3.2 is faster with Unicode than bytestrings and that unpatched Python 3.4 is a lot slower.
I get somewhat different results (Windows XP Pro, 32-bit):
C:\Python32\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
1000 loops, best of 3: 449 usec per loop
C:\Python32\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
1000 loops, best of 3: 506 usec per loop
C:\Python32\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
1000 loops, best of 3: 506 usec per loop
C:\Python34\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
1000 loops, best of 3: 227 usec per loop
C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
1000 loops, best of 3: 339 usec per loop
C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
1000 loops, best of 3: 504 usec per loop
For comparison, in the regex module I don't duplicate whole sections of code, but instead have a pointer to one of 3 functions (for UCS1, UCS2 and UCS4) that gets the codepoint, except for some tight loops. Doing that might be too much of a change for re.
However, the speed appears to be a lot more consistent:
C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile(b'abc').search; x = b'x'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop
C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = 'x'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop
C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = '\u20ac'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop
C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile(b'abc').search; x = b'x'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop
C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = 'x'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop
C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = '\u20ac'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop
|
msg194727 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-08-09 13:29 |
I get the same kind of results as Serhiy:
$ python3.2 -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
10000 loops, best of 3: 81.7 usec per loop
$ python3.2 -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
10000 loops, best of 3: 31.1 usec per loop
$ python3.2 -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
10000 loops, best of 3: 31.1 usec per loop
Unpatched 3.4:
$ ./python -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
10000 loops, best of 3: 81.6 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
10000 loops, best of 3: 163 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
10000 loops, best of 3: 190 usec per loop
Patched 3.4:
$ ./python -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
10000 loops, best of 3: 54.4 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
10000 loops, best of 3: 54.2 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
10000 loops, best of 3: 54.5 usec per loop
|
msg194749 - (view) |
Author: Matthew Barnett (mrabarnett) * |
Date: 2013-08-09 16:04 |
@Antoine: Are you on the same OS as Serhiy?
IIRC, wasn't the performance regression that wxjmfauth complained about in Python 3.3 apparent on Windows, but not on Linux?
|
msg194757 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-08-09 18:57 |
> @Antoine: Are you on the same OS as Serhiy?
I don't know, I'm under Linux with gcc (on a x86-64 machine) :-)
> IIRC, wasn't the performance regression that wxjmfauth complained
> about in Python 3.3 apparent on Windows, but not on Linux?
I don't know, but I'm not willing to give any attention to something
reported by jmfauth. He's very far in the trollzone, as far as I'm
concerned.
However, if you are under Windows and can give it a try, it would be
nice to have performance numbers for Serhiy's patch.
|
msg194760 - (view) |
Author: Matthew Barnett (mrabarnett) * |
Date: 2013-08-09 20:03 |
With the patch the results are:
C:\Python34\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop
C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop
C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop
I'm most impressed! :-)
|
msg194762 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-08-09 20:11 |
I'm under 32-bit Linux with gcc 4.6.3.
The above test is only one example for which I expect largest difference. I suppose other tests will show a gain too.
|
msg194765 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-08-09 20:18 |
Ok, here are some results from the benchmarks suite:
Report on Linux fsol 3.8.0-27-generic #40-Ubuntu SMP Tue Jul 9 00:17:05 UTC 2013 x86_64 x86_64
Total CPU cores: 4
### regex_effbot ###
Min: 0.058952 -> 0.054367: 1.08x faster
Avg: 0.059060 -> 0.054378: 1.09x faster
Significant (t=132.69)
Stddev: 0.00008 -> 0.00001: 5.9597x smaller
### regex_v8 ###
Min: 0.063401 -> 0.050701: 1.25x faster
Avg: 0.066147 -> 0.053530: 1.24x faster
Significant (t=3.22)
Stddev: 0.00608 -> 0.00630: 1.0363x larger
The following not significant results are hidden, use -v to show them:
regex_compile.
|
msg195101 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-08-13 22:57 |
Using #include "_sre.c" in _sre.c looks weird. Instead of huge sections delimited by "#ifdef SRE_RECURSIVE", I would prefer something similar to the stringlib. ".h" template files included more than once. I also expect shorter files: _sre.c is close to 4000 lines of C code :-(
If you move code from _sre.c to a new file, you should use "hg cp" to keep the history. For the review, it's maybe better to continue with your SRE_RECURSIVE hack :)
--
#define SRE_CHAR Py_UCS1
#define SIZEOF_SRE_CHAR 1
..
#define SRE_CHAR Py_UCS2
#define SIZEOF_SRE_CHAR 1
...
#define SRE_CHAR Py_UCS4
#define SIZEOF_SRE_CHAR 1
The value of SIZEOF_SRE_CHAR looks suspicious.
Does test_re have some non-ASCII tests? If not, we should probably start by adding such tests!
|
msg195128 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-08-14 09:44 |
> Using #include "_sre.c" in _sre.c looks weird. Instead of huge sections delimited by "#ifdef SRE_RECURSIVE", I would prefer something similar to the stringlib. ".h" template files included more than once. I also expect shorter files: _sre.c is close to 4000 lines of C code :-(
Agree, but a patch will be larger and harder for the synchronization and for the review in Rietveld. I'm going first solve other issues (issue18647, issue18672) before creating a large final patch.
> The value of SIZEOF_SRE_CHAR looks suspicious.
Good catch. Actually this macro is used only in order to skip some checks for UCS4. It should not affects the correctness, only possible the performance.
> Does test_re have some non-ASCII tests? If not, we should probably start by adding such tests!
There is a small number (about 10) of tests for non-ASCII data.
|
msg200813 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-10-21 17:03 |
Rebased patch to tip and added non-ASCII tests for main re functions.
|
msg201189 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-10-24 21:15 |
Please review this patch. I will extract template part into separated file in separated commit.
|
msg201233 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-10-25 12:43 |
Posted review on Rietveld. See also issue #19387.
|
msg201289 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-10-25 20:28 |
Updated patch addresses Antoine's comments.
|
msg201302 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-10-25 23:12 |
Looks good to me.
|
msg201324 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-10-26 07:46 |
New changeset 66e2dfbb1d70 by Serhiy Storchaka in branch 'default':
Issue #18685: Restore re performance to pre-PEP 393 levels.
http://hg.python.org/cpython/rev/66e2dfbb1d70
|
msg201327 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-10-26 08:15 |
Sorry, I was busy with my tracemalloc PEP, I didn't havee time to review your patch. I'm happy that you restored Python 3.2 performances! Thanks Serhiy.
|
msg201328 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-10-26 08:19 |
New changeset 00e61cb3b11c by Serhiy Storchaka in branch 'default':
Issue #18685: Extract template part of _sre.c into separated sre_lib.h file.
http://hg.python.org/cpython/rev/00e61cb3b11c
|
msg201334 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-10-26 10:28 |
Didn't you forget to add sre_lib.h?
|
msg201335 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-10-26 10:32 |
Ah, sorry, no. I was fooled by the commit e-mail.
|
msg201336 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-10-26 11:00 |
Yes, the commit e-mail looks queer.
|
msg201372 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-10-26 16:50 |
I suppose this issue can be fixed then. Thanks for doing this!
|
msg201375 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-10-26 17:08 |
Thank you for your review Antoine and Victor.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:49 | admin | set | github: 62885 |
2013-10-26 17:08:54 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages:
+ msg201375
stage: commit review -> resolved |
2013-10-26 16:50:24 | pitrou | set | messages:
+ msg201372 |
2013-10-26 11:00:57 | serhiy.storchaka | set | messages:
+ msg201336 |
2013-10-26 10:32:17 | pitrou | set | messages:
+ msg201335 |
2013-10-26 10:28:42 | pitrou | set | messages:
+ msg201334 |
2013-10-26 08:19:30 | python-dev | set | messages:
+ msg201328 |
2013-10-26 08:15:17 | vstinner | set | messages:
+ msg201327 |
2013-10-26 07:46:23 | python-dev | set | nosy:
+ python-dev messages:
+ msg201324
|
2013-10-25 23:12:20 | pitrou | set | messages:
+ msg201302 stage: patch review -> commit review |
2013-10-25 20:28:26 | serhiy.storchaka | set | files:
+ sre_optimize_3.patch
messages:
+ msg201289 |
2013-10-25 12:43:40 | pitrou | set | messages:
+ msg201233 |
2013-10-24 21:15:31 | serhiy.storchaka | set | messages:
+ msg201189 |
2013-10-23 20:32:00 | serhiy.storchaka | set | dependencies:
- Pointers point out of array bound in _sre.c |
2013-10-21 17:03:30 | serhiy.storchaka | set | files:
+ sre_optimize_2.patch
messages:
+ msg200813 |
2013-08-29 10:13:50 | serhiy.storchaka | set | dependencies:
+ Fix format specifiers for debug output in _sre.c, Pointers point out of array bound in _sre.c |
2013-08-25 17:14:49 | Tal.Weiss | set | nosy:
+ Tal.Weiss
|
2013-08-14 09:44:08 | serhiy.storchaka | set | messages:
+ msg195128 |
2013-08-13 22:57:45 | vstinner | set | messages:
+ msg195101 |
2013-08-09 20:18:26 | pitrou | set | messages:
+ msg194765 |
2013-08-09 20:11:54 | serhiy.storchaka | set | messages:
+ msg194762 |
2013-08-09 20:03:01 | mrabarnett | set | messages:
+ msg194760 |
2013-08-09 20:00:01 | vstinner | set | nosy:
+ vstinner
|
2013-08-09 18:57:15 | pitrou | set | messages:
+ msg194757 |
2013-08-09 16:04:49 | mrabarnett | set | messages:
+ msg194749 |
2013-08-09 13:29:51 | pitrou | set | nosy:
+ pitrou messages:
+ msg194727
|
2013-08-09 08:08:50 | pitrou | set | nosy:
+ tim.peters
|
2013-08-08 15:05:54 | mrabarnett | set | messages:
+ msg194683 |
2013-08-08 13:10:17 | serhiy.storchaka | create | |