Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement fastsearch algorithm for rfind/rindex #51711

Closed
florentx mannequin opened this issue Dec 8, 2009 · 25 comments
Closed

Implement fastsearch algorithm for rfind/rindex #51711

florentx mannequin opened this issue Dec 8, 2009 · 25 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@florentx
Copy link
Mannequin

florentx mannequin commented Dec 8, 2009

BPO 7462
Nosy @rhettinger, @pitrou, @ezio-melotti, @florentx
Files
  • issue7462_string_tests.diff: Patch, apply to trunk
  • issue7462_fastsearch_v3.diff: Patch, apply to trunk
  • stringbench_fixes_and_additions_v3.diff: Patch for sandbox/trunk/stringbench/
  • bench_rfind_algorithms_v3.diff: Benchmark Results (including rpartition and rsplit)
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/pitrou'
    closed_at = <Date 2010-01-02.21:42:27.021>
    created_at = <Date 2009-12-08.22:46:43.272>
    labels = ['interpreter-core', 'performance']
    title = 'Implement fastsearch algorithm for rfind/rindex'
    updated_at = <Date 2010-01-04.09:26:26.641>
    user = 'https://github.com/florentx'

    bugs.python.org fields:

    activity = <Date 2010-01-04.09:26:26.641>
    actor = 'ezio.melotti'
    assignee = 'pitrou'
    closed = True
    closed_date = <Date 2010-01-02.21:42:27.021>
    closer = 'pitrou'
    components = ['Interpreter Core']
    creation = <Date 2009-12-08.22:46:43.272>
    creator = 'flox'
    dependencies = []
    files = ['15634', '15648', '15655', '15656']
    hgrepos = []
    issue_num = 7462
    keywords = ['patch']
    message_count = 25.0
    messages = ['96157', '96163', '96724', '96725', '96726', '96727', '96729', '96738', '96739', '96743', '96747', '96748', '96749', '96750', '96757', '96771', '97039', '97084', '97087', '97103', '97105', '97106', '97146', '97148', '97200']
    nosy_count = 5.0
    nosy_names = ['effbot', 'rhettinger', 'pitrou', 'ezio.melotti', 'flox']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue7462'
    versions = ['Python 2.7', 'Python 3.2']

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Dec 8, 2009

    While looking at bpo-7458 I stopped on:

    /* XXX - create reversefastsearch helper! */

    Here is the patch which is just the translation of the same algorithm
    already implemented for "find/index".
    http://effbot.org/zone/stringlib.htm

    Note: it supersedes patch for 7458.

    @florentx florentx mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Dec 8, 2009
    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Dec 9, 2009

    (removed comment which should not be copied)

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Dec 20, 2009

    Additional test cases for rfind.

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Dec 20, 2009

    Bench results show the benefit.

    And attached patch for stringbench tool.

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Dec 20, 2009

    There's a typo in the patch for stringbench.

    Updated patch (with rindex tests, too).

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Dec 20, 2009

    Updated benchmarks.

    str unicode
    (ms) (ms)

    ========== late match, 100 characters
    s="ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s)
    32.89 15.65 rfind (classic)
    32.81 15.63 rindex (classic)
    11.77 13.27 rfind (fastsearch)
    11.78 13.40 rindex (fastsearch)

    ========== late match, two characters
    4.34 2.36 ("C"+"AB"*300).rfind("CA") (classic)
    4.44 2.36 ("C"+"AB"*300).rindex("CA") (classic)
    2.10 2.31 ("C"+"AB"*300).rfind("CA") (fastsearch)
    2.10 2.32 ("C"+"AB"*300).rindex("CA") (fastsearch)

    ========== no match, two characters
    14.12 13.46 ("AB"*1000).rfind("BC") (classic)
    7.67 8.26 ("AB"*1000).rfind("BC") (fastsearch)

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Dec 21, 2009

    Need additional patch for rpartition and rsplit on string, unicode and
    bytearray objects.

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Dec 21, 2009

    Updated patch with optimization for:

    • rfind
    • rindex
    • rsplit
    • rpartition

    And an optimization was implemented in r46398 but never used because
    "#define USE_FAST" was removed 2 hours before: r46366.
    ==> I changed this to activate optimization on "S.split" too.

    I propose to commit these changes before looking for further improvements.
    (I plan to refactor some functions and macro to Objects/stringlib)

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Dec 21, 2009

    Actually, the USE_FLAG macro was removed in r46367 (not 46366).

    « needforspeed: remove remaining USE_FAST macros; if fastsearch was
    broken, someone would have noticed by now ;-) »

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Dec 21, 2009

    I reupload the patch fixed (sorry).

    @pitrou
    Copy link
    Member

    pitrou commented Dec 21, 2009

    Some things:

    • is STRINGLIB_CMP still used? if not, it should be removed (use grep :-))
    • please don't use "#if 1"
    • if USE_FAST isn't used anymore, it should be remove as well
    • did you check that rpartition, split and friends were sped up by the
      patch?

    Thanks for working on this!

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Dec 21, 2009

    Actually I see different macros which do the same thing: I will consider
    reusing STRINGLIB_CMP to re-define PyUNICODE_MATCH and PySTRING_MATCH
    (or create aliases if they have the same signature).

    I can prepare a "all-in-one" patch which fixes all these things.
    But don't you think we should do this incrementally, i.e. commit the
    current patch before refactoring more code?

    I will remove the "#if 1 /* USE_FAST */".

    @pitrou
    Copy link
    Member

    pitrou commented Dec 21, 2009

    Actually I see different macros which do the same thing: I will consider
    reusing STRINGLIB_CMP to re-define PyUNICODE_MATCH and PySTRING_MATCH
    (or create aliases if they have the same signature).

    STRINGLIB_CMP, as the name implies, should only be used by stringlib.
    (anything which doesn't start with Py* shouldn't be exported in the
    official include files)

    But don't you think we should do this incrementally, i.e. commit the
    current patch before refactoring more code?

    Well, if STRINGLIB_CMP isn't used anymore, removing it should be part of
    the current issue. There's no reason to leave dead code in the source
    tree. I agree that further cleanups can be part of another issue.

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Dec 21, 2009

    Removing "slow" parts and unnused macros STRINGLIB_CMP and USE_FAST.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 21, 2009

    I haven't reviewed the algorithm (I don't know if I will, honestly), but
    at least on the principle this looks good.
    Fredrik, do you want to take a look?

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Dec 21, 2009

    New benchmarks (and patch for the benchmark tool).

    Best improvement is reached with such expression:
    s="ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100)

    String (classic): 93.14 ms
    String (fast): 8.78 ms

    Unicode (classic): 78.62 ms
    Unicode (fast): 9.42 ms

    @pitrou
    Copy link
    Member

    pitrou commented Dec 30, 2009

    Looking a bit more at the patch:

    + /* miss: check if previous character is part of pattern */
    + if (!(mask & (1 << (s[i-1] & 0x1F))))

    From what I understand, this should be s[i-m]. Same here:

    + /* skip: check if previous character is part of pattern */
    + if (!(mask & (1 << (s[i-1] & 0x1F))))

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Dec 31, 2009

    I checked the code, and it is the right thing:

    Example 1 (fastsearch):

    s, p = "ABCDEABCF", "BCD"
    s.rfind(p)

    # i = 6 is first candidate, but "BCF" != "BCD"
    # then s[i-1] = "A" is not part of pattern
    # so we skip 3 chars: next loop is i = 2
    # etc...

    Example 2 (msg97039):

    s, p = "ABCDBCBCF", "BCB"
    s.rfind(p)

    # i = 6 is first candidate, but "BCF" != "BCB"
    # then s[i-m] = "D" is not part of pattern
    # so we skip 3 chars: next loop is i = 2.. and it fails!

    I've tested many examples to understand and verify the algorithm.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 31, 2009

    I checked the code, and it is the right thing:

    Indeed, you are right. My bad!

    @pitrou
    Copy link
    Member

    pitrou commented Dec 31, 2009

    I will take a last look at it and commit if I see nothing wrong.

    @pitrou pitrou self-assigned this Dec 31, 2009
    @rhettinger
    Copy link
    Contributor

    Are there any simple, common cases that are made slower by this patch?
    IIRC, that was the reason this wasn't implemented previously.

    @florentx
    Copy link
    Mannequin Author

    florentx mannequin commented Dec 31, 2009

    The benchmark tests show significant improvements in most cases up to 10
    times faster.

    And I found no test case which show performance regression (using the
    "stringbench" benchmark from GvR, and additional tests).

    Moreover, the very same algorithm is already implemented for
    find/index/partition.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 2, 2010

    I've added a version number to stringbench and committed the changes in r77240.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 2, 2010

    The main patch has been committed in r77241 (trunk) and r77246 (py3k).
    I've ommitted the tests you had added for bpo-7458.
    Thank you!

    @pitrou pitrou closed this as completed Jan 2, 2010
    @effbot
    Copy link
    Mannequin

    effbot mannequin commented Jan 4, 2010

    Thanks Florent!

    Are there any simple, common cases that are made slower by this patch?

    The original fastsearch implementation has a couple of special cases to make sure it's faster than the original code in all cases. The reason it wasn't implemented for reverse search was more a question of developer time constraints; reverse search isn't nearly as common as forward search, and we had other low-hanging fruit to deal with.

    (btw, while it's great that someone finally got around to fix this, it wouldn't surprise me if replacing the KMP implementation in SRE with a fastsearch would save as many CPU cycles worldwide as this patch :)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants