Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore re performance to pre-PEP393 level #62885

Closed
serhiy-storchaka opened this issue Aug 8, 2013 · 23 comments
Closed

Restore re performance to pre-PEP393 level #62885

serhiy-storchaka opened this issue Aug 8, 2013 · 23 comments
Assignees
Labels
performance Performance or resource usage topic-regex topic-unicode

Comments

@serhiy-storchaka
Copy link
Member

BPO 18685
Nosy @tim-one, @pitrou, @vstinner, @ezio-melotti, @serhiy-storchaka
Dependencies
  • bpo-18672: Fix format specifiers for debug output in _sre.c
  • Files
  • sre_optimize.patch
  • sre_optimize_2.patch
  • sre_optimize_3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2013-10-26.17:08:54.625>
    created_at = <Date 2013-08-08.13:10:17.918>
    labels = ['expert-regex', 'expert-unicode', 'performance']
    title = 'Restore re performance to pre-PEP393 level'
    updated_at = <Date 2013-10-26.17:08:54.623>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2013-10-26.17:08:54.623>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2013-10-26.17:08:54.625>
    closer = 'serhiy.storchaka'
    components = ['Regular Expressions', 'Unicode']
    creation = <Date 2013-08-08.13:10:17.918>
    creator = 'serhiy.storchaka'
    dependencies = ['18672']
    files = ['31198', '32283', '32363']
    hgrepos = []
    issue_num = 18685
    keywords = ['patch']
    message_count = 23.0
    messages = ['194669', '194683', '194727', '194749', '194757', '194760', '194762', '194765', '195101', '195128', '200813', '201189', '201233', '201289', '201302', '201324', '201327', '201328', '201334', '201335', '201336', '201372', '201375']
    nosy_count = 8.0
    nosy_names = ['tim.peters', 'pitrou', 'vstinner', 'ezio.melotti', 'mrabarnett', 'python-dev', 'serhiy.storchaka', 'Tal.Weiss']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue18685'
    versions = ['Python 3.4']

    @serhiy-storchaka
    Copy link
    Member Author

    Before PEP-393 the regex functions scanned an array of char or Py_UNICODE and character testing was cheap. After PEP-393 they checks a kind of an unicode string for every tested character and processing of unicode strings becomes slower. _sre.c already generates two sets of functions from one source -- for byte and unicode strings. The proposed patch uses same technique to generate three sets of functions -- for byte/UCS1, UCS2 and UCS4 strings. This simplifies the code (now it more similar to pre-PEP393 version) and makes characters testing faster.

    Benchmark example:

    Python 3.2:
    $ python3.2 -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
    1000 loops, best of 3: 613 usec per loop
    $ python3.2 -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
    1000 loops, best of 3: 232 usec per loop
    $ python3.2 -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
    1000 loops, best of 3: 217 usec per loop

    Python 3.4.0a1+ unpatched:
    $ ./python -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
    1000 loops, best of 3: 485 usec per loop
    $ ./python -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
    1000 loops, best of 3: 790 usec per loop
    $ ./python -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
    1000 loops, best of 3: 1.09 msec per loop

    Python 3.4.0a1+ patched:
    $ ./python -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
    1000 loops, best of 3: 250 usec per loop
    $ ./python -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
    1000 loops, best of 3: 250 usec per loop
    $ ./python -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
    1000 loops, best of 3: 256 usec per loop

    I also propose for simplicity extract a template part of _sre.c to separated file (i.e. srelib.h) and get rid of recursion.

    @serhiy-storchaka serhiy-storchaka self-assigned this Aug 8, 2013
    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Aug 8, 2013

    It appears that in your tests Python 3.2 is faster with Unicode than bytestrings and that unpatched Python 3.4 is a lot slower.

    I get somewhat different results (Windows XP Pro, 32-bit):

    C:\Python32\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
    1000 loops, best of 3: 449 usec per loop

    C:\Python32\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
    1000 loops, best of 3: 506 usec per loop

    C:\Python32\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
    1000 loops, best of 3: 506 usec per loop

    C:\Python34\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
    1000 loops, best of 3: 227 usec per loop

    C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
    1000 loops, best of 3: 339 usec per loop

    C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
    1000 loops, best of 3: 504 usec per loop

    For comparison, in the regex module I don't duplicate whole sections of code, but instead have a pointer to one of 3 functions (for UCS1, UCS2 and UCS4) that gets the codepoint, except for some tight loops. Doing that might be too much of a change for re.

    However, the speed appears to be a lot more consistent:

    C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile(b'abc').search; x = b'x'*100000" "f(x)"
    10000 loops, best of 3: 113 usec per loop

    C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = 'x'*100000" "f(x)"
    10000 loops, best of 3: 113 usec per loop

    C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = '\u20ac'*100000" "f(x)"
    10000 loops, best of 3: 113 usec per loop

    C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile(b'abc').search; x = b'x'*100000" "f(x)"
    10000 loops, best of 3: 113 usec per loop

    C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = 'x'*100000" "f(x)"
    10000 loops, best of 3: 113 usec per loop

    C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = '\u20ac'*100000" "f(x)"
    10000 loops, best of 3: 113 usec per loop

    @pitrou
    Copy link
    Member

    pitrou commented Aug 9, 2013

    I get the same kind of results as Serhiy:

    $ python3.2 -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000"  "f(x)"
    10000 loops, best of 3: 81.7 usec per loop
    $ python3.2 -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000"  "f(x)"
    10000 loops, best of 3: 31.1 usec per loop
    $ python3.2 -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000"  "f(x)"
    10000 loops, best of 3: 31.1 usec per loop

    Unpatched 3.4:

    $ ./python -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000"  "f(x)"
    10000 loops, best of 3: 81.6 usec per loop
    $ ./python -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000"  "f(x)"
    10000 loops, best of 3: 163 usec per loop
    $ ./python -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000"  "f(x)"
    10000 loops, best of 3: 190 usec per loop

    Patched 3.4:

    $ ./python -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000"  "f(x)"
    10000 loops, best of 3: 54.4 usec per loop
    $ ./python -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000"  "f(x)"
    10000 loops, best of 3: 54.2 usec per loop
    $ ./python -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000"  "f(x)"
    10000 loops, best of 3: 54.5 usec per loop

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Aug 9, 2013

    @antoine: Are you on the same OS as Serhiy?

    IIRC, wasn't the performance regression that wxjmfauth complained about in Python 3.3 apparent on Windows, but not on Linux?

    @pitrou
    Copy link
    Member

    pitrou commented Aug 9, 2013

    @antoine: Are you on the same OS as Serhiy?

    I don't know, I'm under Linux with gcc (on a x86-64 machine) :-)

    IIRC, wasn't the performance regression that wxjmfauth complained
    about in Python 3.3 apparent on Windows, but not on Linux?

    I don't know, but I'm not willing to give any attention to something
    reported by jmfauth. He's very far in the trollzone, as far as I'm
    concerned.

    However, if you are under Windows and can give it a try, it would be
    nice to have performance numbers for Serhiy's patch.

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Aug 9, 2013

    With the patch the results are:

    C:\Python34\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
    10000 loops, best of 3: 113 usec per loop

    C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
    10000 loops, best of 3: 113 usec per loop

    C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
    10000 loops, best of 3: 113 usec per loop

    I'm most impressed! :-)

    @serhiy-storchaka
    Copy link
    Member Author

    I'm under 32-bit Linux with gcc 4.6.3.

    The above test is only one example for which I expect largest difference. I suppose other tests will show a gain too.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 9, 2013

    Ok, here are some results from the benchmarks suite:

    Report on Linux fsol 3.8.0-27-generic #40-Ubuntu SMP Tue Jul 9 00:17:05 UTC 2013 x86_64 x86_64
    Total CPU cores: 4

    ### regex_effbot ###
    Min: 0.058952 -> 0.054367: 1.08x faster
    Avg: 0.059060 -> 0.054378: 1.09x faster
    Significant (t=132.69)
    Stddev: 0.00008 -> 0.00001: 5.9597x smaller

    ### regex_v8 ###
    Min: 0.063401 -> 0.050701: 1.25x faster
    Avg: 0.066147 -> 0.053530: 1.24x faster
    Significant (t=3.22)
    Stddev: 0.00608 -> 0.00630: 1.0363x larger

    The following not significant results are hidden, use -v to show them:
    regex_compile.

    @vstinner
    Copy link
    Member

    Using #include "_sre.c" in _sre.c looks weird. Instead of huge sections delimited by "#ifdef SRE_RECURSIVE", I would prefer something similar to the stringlib. ".h" template files included more than once. I also expect shorter files: _sre.c is close to 4000 lines of C code :-(

    If you move code from _sre.c to a new file, you should use "hg cp" to keep the history. For the review, it's maybe better to continue with your SRE_RECURSIVE hack :)

    --

    #define SRE_CHAR Py_UCS1
    #define SIZEOF_SRE_CHAR 1
    ..
    #define SRE_CHAR Py_UCS2
    #define SIZEOF_SRE_CHAR 1
    ...
    #define SRE_CHAR Py_UCS4
    #define SIZEOF_SRE_CHAR 1

    The value of SIZEOF_SRE_CHAR looks suspicious.

    Does test_re have some non-ASCII tests? If not, we should probably start by adding such tests!

    @serhiy-storchaka
    Copy link
    Member Author

    Using #include "_sre.c" in _sre.c looks weird. Instead of huge sections delimited by "#ifdef SRE_RECURSIVE", I would prefer something similar to the stringlib. ".h" template files included more than once. I also expect shorter files: _sre.c is close to 4000 lines of C code :-(

    Agree, but a patch will be larger and harder for the synchronization and for the review in Rietveld. I'm going first solve other issues (bpo-18647, bpo-18672) before creating a large final patch.

    The value of SIZEOF_SRE_CHAR looks suspicious.

    Good catch. Actually this macro is used only in order to skip some checks for UCS4. It should not affects the correctness, only possible the performance.

    Does test_re have some non-ASCII tests? If not, we should probably start by adding such tests!

    There is a small number (about 10) of tests for non-ASCII data.

    @serhiy-storchaka
    Copy link
    Member Author

    Rebased patch to tip and added non-ASCII tests for main re functions.

    @serhiy-storchaka
    Copy link
    Member Author

    Please review this patch. I will extract template part into separated file in separated commit.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 25, 2013

    Posted review on Rietveld. See also issue bpo-19387.

    @serhiy-storchaka
    Copy link
    Member Author

    Updated patch addresses Antoine's comments.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 25, 2013

    Looks good to me.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 26, 2013

    New changeset 66e2dfbb1d70 by Serhiy Storchaka in branch 'default':
    Issue bpo-18685: Restore re performance to pre-PEP 393 levels.
    http://hg.python.org/cpython/rev/66e2dfbb1d70

    @vstinner
    Copy link
    Member

    Sorry, I was busy with my tracemalloc PEP, I didn't havee time to review your patch. I'm happy that you restored Python 3.2 performances! Thanks Serhiy.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 26, 2013

    New changeset 00e61cb3b11c by Serhiy Storchaka in branch 'default':
    Issue bpo-18685: Extract template part of _sre.c into separated sre_lib.h file.
    http://hg.python.org/cpython/rev/00e61cb3b11c

    @pitrou
    Copy link
    Member

    pitrou commented Oct 26, 2013

    Didn't you forget to add sre_lib.h?

    @pitrou
    Copy link
    Member

    pitrou commented Oct 26, 2013

    Ah, sorry, no. I was fooled by the commit e-mail.

    @serhiy-storchaka
    Copy link
    Member Author

    Yes, the commit e-mail looks queer.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 26, 2013

    I suppose this issue can be fixed then. Thanks for doing this!

    @serhiy-storchaka
    Copy link
    Member Author

    Thank you for your review Antoine and Victor.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage topic-regex topic-unicode
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants