Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance regression in string replace for 3.3 #60265

Closed
BreamoreBoy mannequin opened this issue Sep 27, 2012 · 22 comments
Closed

performance regression in string replace for 3.3 #60265

BreamoreBoy mannequin opened this issue Sep 27, 2012 · 22 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-unicode

Comments

@BreamoreBoy
Copy link
Mannequin

BreamoreBoy mannequin commented Sep 27, 2012

BPO 16061
Nosy @loewis, @terryjreedy, @jcea, @pitrou, @vstinner, @benjaminp, @ezio-melotti, @serhiy-storchaka, @kushaldas
Files
  • unicode.patch
  • unicode_2.patch
  • replacebench.py: Microbenchmark for regularly distributed replacements
  • replacebench2.py: Microbenchmark for randomly distributed replacements
  • str_replace_1char.patch
  • str_replace_1char_2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2013-04-13.19:47:21.116>
    created_at = <Date 2012-09-27.16:03:07.268>
    labels = ['interpreter-core', 'expert-unicode', 'performance']
    title = 'performance regression in string replace for 3.3'
    updated_at = <Date 2013-04-13.19:47:21.115>
    user = 'https://bugs.python.org/BreamoreBoy'

    bugs.python.org fields:

    activity = <Date 2013-04-13.19:47:21.115>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2013-04-13.19:47:21.116>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core', 'Unicode']
    creation = <Date 2012-09-27.16:03:07.268>
    creator = 'BreamoreBoy'
    dependencies = []
    files = ['27521', '27526', '27544', '27545', '27553', '29721']
    hgrepos = []
    issue_num = 16061
    keywords = ['patch', '3.3regression']
    message_count = 22.0
    messages = ['171378', '171407', '171413', '172602', '172628', '172691', '172769', '172780', '172821', '178604', '178606', '178607', '178609', '178612', '178615', '178666', '185864', '185948', '186248', '186249', '186807', '186808']
    nosy_count = 12.0
    nosy_names = ['loewis', 'terry.reedy', 'jcea', 'pitrou', 'vstinner', 'thomaslee', 'benjamin.peterson', 'ezio.melotti', 'BreamoreBoy', 'python-dev', 'serhiy.storchaka', 'kushal.das']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue16061'
    versions = ['Python 3.4']

    @BreamoreBoy
    Copy link
    Mannequin Author

    BreamoreBoy mannequin commented Sep 27, 2012

    Quoting Steven D'Aprano on c.l.p.

    "But add a call to replace, and things are very different:

    [steve@ando ~]$ python2.7 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
    100000 loops, best of 3: 9.3 usec per loop
    [steve@ando ~]$ python3.2 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
    100000 loops, best of 3: 5.43 usec per loop
    [steve@ando ~]$ python3.3 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
    100000 loops, best of 3: 18.3 usec per loop

    Three times slower, even for pure-ASCII strings. I get comparable results for Unicode. Notice how slow Python 2.7 is:

    [steve@ando ~]$ python2.7 -m timeit -s "s = u'你'*1000" "s.replace(u'你', u'a')"
    10000 loops, best of 3: 65.6 usec per loop
    [steve@ando ~]$ python3.2 -m timeit -s "s = '你'*1000" "s.replace('你', 'a')"
    100000 loops, best of 3: 2.79 usec per loop
    [steve@ando ~]$ python3.3 -m timeit -s "s = '你'*1000" "s.replace('你', 'a')"
    10000 loops, best of 3: 23.7 usec per loop

    Even with the performance regression, it is still over twice as fast as
    Python 2.7.

    Nevertheless, I think there is something here. The consequences are nowhere near as dramatic as jmf claims, but it does seem that replace() has taken a serious performance hit. Perhaps it is unavoidable, but perhaps not.

    If anyone else can confirm similar results, I think this should be raised as a performance regression.

    Quoting Serhiy Storchaka in response.

    "Yes, I confirm, it's a performance regression. It should be avoidable.
    Almost any PEP-393 performance regression can be avoided. At least for
    such corner case. Just no one has yet optimized this case."

    @BreamoreBoy BreamoreBoy mannequin added the performance Performance or resource usage label Sep 27, 2012
    @serhiy-storchaka serhiy-storchaka added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Sep 27, 2012
    @thomaslee
    Copy link
    Mannequin

    thomaslee mannequin commented Sep 28, 2012

    My results aren't quite as dramatic as yours, but there does appear to be a regression:

    $ ./python -V
    Python 2.7.3+
    
    $ ./python -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
    100000 loops, best of 3: 16.5 usec per loop
    
    $ ./python -V
    Python 3.3.0rc3+
    
    $ ./python -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
    10000 loops, best of 3: 22.7 usec per loop

    @vstinner
    Copy link
    Member

    Python 3.3 is 2x faster than Python 3.2 to replace a character with
    another if the string only contains the character 3 times. This is not
    acceptable, Python 3.3 must be as slow as Python 3.2!

    $ python3.2 -m timeit "ch='é'; sp=' '*1000; s = ch+sp+ch+sp+ch;
    after='à'; s.replace(ch, after)"
    100000 loops, best of 3: 3.62 usec per loop
    $ python3.3 -m timeit "ch='é'; sp=' '*1000; s = ch+sp+ch+sp+ch;
    after='à'; s.replace(ch, after)"
    1000000 loops, best of 3: 1.36 usec per loop
    
    $ python3.2 -m timeit "ch='€'; sp=' '*1000; s = ch+sp+ch+sp+ch;
    after='Ł'; s.replace(ch, after)"
    100000 loops, best of 3: 3.15 usec per loop
    $ python3.2 -m timeit "ch='€'; sp=' '*1000; s = ch+sp+ch+sp+ch;
    after='Ł'; s.replace(ch, after)"
    1000000 loops, best of 3: 1.91 usec per loop

    More seriously, I changed the algorithm of str.replace(before, after)
    when before and after are only one character: changeset c802bfc8acfc.
    The code is now using the heavily optimized findchar() function.
    PyUnicode_READ() is slow and should be avoided when possible:
    PyUnicode_READ() macro is expanded to 2 if, whereas findchar() uses
    directly pointer of the right type (Py_UCS1*, Py_UCS2* or Py_UCS4*).

    In Python 3.2, the code looks like:

                for (i = 0; i < u->length; i++) {
                    if (u->str[i] == u1) {
                        if (--maxcount < 0)
                            break;
                        u->str[i] = u2;
                    }
                }

    In Python 3.3, the code looks like:

                pos = findchar(sbuf, PyUnicode_KIND(self), slen, u1, 1);
                if (pos < 0)
                    goto nothing;
                ...
                while (--maxcount)
                {
                    pos++;
                    src += pos * PyUnicode_KIND(self);
                    slen -= pos;
                    index += pos;
                    pos = findchar(src, PyUnicode_KIND(self), slen, u1, 1);
                    if (pos < 0)
                        break;
                    PyUnicode_WRITE(rkind, PyUnicode_DATA(u), index + pos, u2);
                }

    @vstinner
    Copy link
    Member

    The code is now using the heavily optimized findchar() function.

    I compared performances of the two methods: dummy loop vs find. Results with a string of 100,000 characters:

    • Replace 100% (rewrite all characters): find is 12.5x slower than a loop
    • Replace 50%: find is 3.3x slower
    • Replace only 2 characters (0.001%): find is 10.4x faster

    In practice, I bet that the most common case is to replace only a few characters. Replace all characters is a rare usecase.

    Use attached "unicode.patch" on Python 3.4 with the following commands to reproduce my benchmark:

    python -m timeit -s "a='a'; b='b'; text=a*100000" "text.replace(a, b)"
    python -m timeit -s "a='a'; b='b'; text=(a+' ')*(100000//2)" "text.replace(a, b)"
    python -m timeit -s "a='a'; b='b'; text=a+' '*100000+a" "text.replace(a, b)"

    --

    An option is to use the find method, and then switch to the dummy loop method if there are too much characters to replace. I don't know if it's necessary to develop such complex algorithm. It would be better to have a benchmark extracted from a real world application like a template engine.

    @serhiy-storchaka
    Copy link
    Member

    I compared performances of the two methods: dummy loop vs find.

    You can hybridize them. First just compare chars and if not match then use
    memcmp(). This speed up the case of repeated chars.

    @vstinner
    Copy link
    Member

    You can hybridize them. First just compare chars and if not match then use
    memcmp(). This speed up the case of repeated chars.

    Oh, you're patch is simple and it's amazing fast! I compare unicode with
    Python 2.7, 3.2, 3.4 and 3.4 patched, and bytes with 2.7. Using your patch,
    Python 3.4 is the fastest implemented in most cases.

    Common platform:
    CPU model: Intel(R) Core(TM) i5 CPU 661 @ 3.33GHz
    Bits: int=32, long=32, long long=64, pointer=32
    Platform: Linux-3.2.0-31-generic-pae-i686-with-debian-wheezy-sid

    Platform of campaign 2.7-bytes:
    Python unicode implementation: UTF-16
    Python version: 2.7.3+ (2.7:19d37c8d1882+, Oct 9 2012, 14:37:36) [GCC 4.6.3]
    CFLAGS: -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall
    -Wstrict-prototypes
    SCM: hg revision=ad51ed93377c tag=tip branch=default date="2012-10-11 00:11
    -0700"
    Date: 2012-10-11 14:41:49

    Platform of campaign 2.7-unicode:
    Python unicode implementation: UTF-16
    Python version: 2.7.3+ (2.7:19d37c8d1882+, Oct 9 2012, 14:37:36) [GCC 4.6.3]
    CFLAGS: -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall
    -Wstrict-prototypes
    SCM: hg revision=ad51ed93377c tag=tip branch=default date="2012-10-11 00:11
    -0700"
    Date: 2012-10-11 14:42:55

    Platform of campaign 3.2-wide:
    Python unicode implementation: UCS-4
    Python version: 3.2.3+ (3.2:f7615ee43318, Sep 27 2012, 15:00:15) [GCC 4.6.3]
    CFLAGS: -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
    SCM: hg revision=ad51ed93377c tag=tip branch=default date="2012-10-11 00:11
    -0700"
    Date: 2012-10-11 14:41:30

    Platform of campaign 3.4:
    Python unicode implementation: PEP-393
    Python version: 3.4.0a0 (default:ad51ed93377c, Oct 11 2012, 14:40:51) [GCC
    4.6.3]
    CFLAGS: -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
    SCM: hg revision=ad51ed93377c tag=tip branch=default date="2012-10-11 00:11
    -0700"
    Date: 2012-10-11 14:40:52

    Platform of campaign 3.4-patch:
    Date: 2012-10-11 14:40:25
    Python version: 3.4.0a0 (default:ad51ed93377c+, Oct 11 2012, 14:33:04) [GCC
    4.6.3]
    CFLAGS: -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
    SCM: hg revision=ad51ed93377c+ tag=tip branch=default date="2012-10-11
    00:11 -0700"
    Python unicode implementation: PEP-393

    ----------------+-----------------+-----------------+-----------------+-----------------+----------------
    Tests | 2.7-bytes | 2.7-unicode | 3.2-wide | 3.4 | 3.4-patch
    ----------------+-----------------+-----------------+-----------------+-----------------+----------------
    all | 7.83 ms (+552%) | 2.05 ms (+71%) | 3.45 ms (+188%) | 15 ms (+1152%) |
    1.2 ms ()
    replace 50% | 4.14 ms (+135%) | 1.76 ms (
    ) | 3.17 ms (+81%) | 7.76 ms
    (+342%) | 4.18 ms (+138%)
    replace 10% | 1.21 ms () | 1.52 ms (+26%) | 3.01 ms (+150%) | 2.01 ms
    (+67%) | 1.23 ms
    replace 1% | 490 us | 1.55 ms (+217%) | 2.94 ms (+501%) | 589 us (+20%) |
    489 us (
    )
    replace 2 chars | 398 us | 1.47 ms (+271%) | 2.89 ms (+632%) | 398 us | 395
    us ()
    ----------------+-----------------+-----------------+-----------------+-----------------+----------------
    Total | 14.1 ms (+88%) | 8.34 ms (+11%) | 15.5 ms (+106%) | 25.8 ms (+244%)
    | 7.49 ms (
    )
    ----------------+-----------------+-----------------+-----------------+-----------------+----------------

    **

    Compare 3.2, 3.4 and 3.4 patched:

    ----------------+-------------+-----------------+---------------
    Tests | 3.2-wide | 3.4 | 3.4-patch
    ----------------+-------------+-----------------+---------------
    all | 3.45 ms () | 15 ms (+335%) | 1.2 ms (-65%)
    replace 50% | 3.17 ms (
    ) | 7.76 ms (+145%) | 4.18 ms (+32%)
    replace 10% | 3.01 ms () | 2.01 ms (-33%) | 1.23 ms (-59%)
    replace 1% | 2.94 ms (
    ) | 589 us (-80%) | 489 us (-83%)
    replace 2 chars | 2.89 ms () | 398 us (-86%) | 395 us (-86%)
    ----------------+-------------+-----------------+---------------
    Total | 15.5 ms (
    ) | 25.8 ms (+67%) | 7.49 ms (-52%)
    ----------------+-------------+-----------------+---------------

    The patch should be completed to optimize also other Unicode kinds.

    @serhiy-storchaka
    Copy link
    Member

    The patch should be completed to optimize also other Unicode kinds.

    I'm working on it.

    Here are benchmark scripts which I use. First tests regular strings (replace
    every n-th char), second tests random strings (replace 1/n of total randomly
    distributed chars).

    @pitrou
    Copy link
    Member

    pitrou commented Oct 12, 2012

    The performance numbers are very nice, but the patch needs a comment about the optimization, IMO.

    @serhiy-storchaka
    Copy link
    Member

    After much experimentation, I suggest the new patch.

    Benchmark results (time of replacing 1 of n character (ch1 to ch2) in 100000-
    char string).

    Py3.2 Py3.3 patch n ch1 ch2 fill

    231 (-13%) 3025 (-93%) 200 1 'a' 'b' 'c'
    626 (-18%) 2035 (-75%) 511 2 'a' 'b' 'c'
    444 (-26%) 957 (-66%) 327 5 'a' 'b' 'c'
    349 (-30%) 530 (-54%) 243 10 'a' 'b' 'c'
    306 (-40%) 300 (-38%) 185 20 'a' 'b' 'c'
    280 (-54%) 169 (-23%) 130 50 'a' 'b' 'c'
    273 (-62%) 123 (-15%) 105 100 'a' 'b' 'c'
    265 (-70%) 82 (-4%) 79 1000 'a' 'b' 'c'
    230 (+4%) 3012 (-92%) 239 1 '\u010a' '\u010b' '\u010c'
    624 (-17%) 1907 (-73%) 518 2 '\u010a' '\u010b' '\u010c'
    442 (-16%) 962 (-62%) 370 5 '\u010a' '\u010b' '\u010c'
    347 (-5%) 566 (-42%) 330 10 '\u010a' '\u010b' '\u010c'
    305 (-10%) 357 (-23%) 275 20 '\u010a' '\u010b' '\u010c'
    285 (-26%) 241 (-12%) 212 50 '\u010a' '\u010b' '\u010c'
    280 (-33%) 190 (-2%) 187 100 '\u010a' '\u010b' '\u010c'
    263 (-41%) 170 (-8%) 156 1000 '\u010a' '\u010b' '\u010c'
    3355 (-85%) 3309 (-85%) 498 1 '\U0001000a' '\U0001000b' '\U0001000c'
    2290 (-65%) 2267 (-65%) 800 2 '\U0001000a' '\U0001000b' '\U0001000c'
    1598 (-62%) 1279 (-52%) 612 5 '\U0001000a' '\U0001000b' '\U0001000c'
    1313 (-60%) 950 (-45%) 519 10 '\U0001000a' '\U0001000b' '\U0001000c'
    1195 (-61%) 824 (-44%) 464 20 '\U0001000a' '\U0001000b' '\U0001000c'
    1055 (-59%) 640 (-32%) 434 50 '\U0001000a' '\U0001000b' '\U0001000c'
    982 (-55%) 549 (-20%) 439 100 '\U0001000a' '\U0001000b' '\U0001000c'
    941 (-56%) 473 (-12%) 417 1000 '\U0001000a' '\U0001000b' '\U0001000c'

    On other platforms other numbers are possible. Especially I'm interested in
    the results on Windows and on 64-bit. For the test I used the script
    replacebench2.py, and compared the results with the help of script
    https://bitbucket.org/storchaka/cpython-stuff/raw/default/bench/bench-diff.py
    .

    @serhiy-storchaka serhiy-storchaka self-assigned this Dec 29, 2012
    @serhiy-storchaka
    Copy link
    Member

    I going speed up other cases for replace(), but for now I have only this patch. Is it good? Should I apply it to 3.3 as there is a 3.3 regression?

    @benjaminp
    Copy link
    Contributor

    As __ap__ says, it would be nice to have a comment.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 30, 2012

    64-bit linux results:

    3.2 3.3 patch

    133 (-28%) 1343 (-93%) 96 1 'a' 'b' 'c'
    414 (-9%) 704 (-47%) 375 2 'a' 'b' 'c'
    319 (-8%) 491 (-40%) 293 3 'a' 'b' 'c'
    253 (-7%) 384 (-39%) 235 4 'a' 'b' 'c'
    216 (-8%) 320 (-38%) 199 5 'a' 'b' 'c'
    192 (-9%) 278 (-37%) 175 6 'a' 'b' 'c'
    175 (-10%) 247 (-36%) 157 7 'a' 'b' 'c'
    162 (-11%) 223 (-35%) 144 8 'a' 'b' 'c'
    153 (-14%) 204 (-35%) 132 9 'a' 'b' 'c'
    145 (-15%) 188 (-35%) 123 10 'a' 'b' 'c'
    108 (-36%) 112 (-38%) 69 20 'a' 'b' 'c'
    86 (-59%) 53 (-34%) 35 50 'a' 'b' 'c'
    78 (-71%) 31 (-26%) 23 100 'a' 'b' 'c'
    73 (-84%) 12 (+0%) 12 1000 'a' 'b' 'c'
    81 (-88%) 10 (+0%) 10 10000 'a' 'b' 'c'

    133 (-23%) 1470 (-93%) 103 1 '\u010a' '\u010b' '\u010c'
    414 (-10%) 799 (-54%) 371 2 '\u010a' '\u010b' '\u010c'
    319 (-5%) 576 (-47%) 303 3 '\u010a' '\u010b' '\u010c'
    254 (-1%) 461 (-46%) 251 4 '\u010a' '\u010b' '\u010c'
    216 (+2%) 391 (-44%) 220 5 '\u010a' '\u010b' '\u010c'
    193 (+4%) 341 (-41%) 200 6 '\u010a' '\u010b' '\u010c'
    175 (+5%) 303 (-39%) 184 7 '\u010a' '\u010b' '\u010c'
    163 (+6%) 275 (-37%) 172 8 '\u010a' '\u010b' '\u010c'
    153 (+6%) 252 (-36%) 162 9 '\u010a' '\u010b' '\u010c'
    145 (+7%) 235 (-34%) 155 10 '\u010a' '\u010b' '\u010c'
    108 (-1%) 133 (-20%) 107 20 '\u010a' '\u010b' '\u010c'
    86 (-27%) 66 (-5%) 63 50 '\u010a' '\u010b' '\u010c'
    79 (-44%) 44 (+0%) 44 100 '\u010a' '\u010b' '\u010c'
    74 (-66%) 24 (+4%) 25 1000 '\u010a' '\u010b' '\u010c'
    75 (-71%) 22 (+0%) 22 10000 '\u010a' '\u010b' '\u010c'

    1687 (-91%) 1362 (-89%) 150 1 '\U0001000a' '\U0001000b' '\U0001000c'
    1146 (-58%) 817 (-41%) 479 2 '\U0001000a' '\U0001000b' '\U0001000c'
    919 (-61%) 627 (-43%) 358 3 '\U0001000a' '\U0001000b' '\U0001000c'
    802 (-63%) 521 (-44%) 294 4 '\U0001000a' '\U0001000b' '\U0001000c'
    729 (-64%) 446 (-42%) 259 5 '\U0001000a' '\U0001000b' '\U0001000c'
    678 (-65%) 394 (-40%) 237 6 '\U0001000a' '\U0001000b' '\U0001000c'
    643 (-66%) 350 (-37%) 220 7 '\U0001000a' '\U0001000b' '\U0001000c'
    617 (-66%) 313 (-34%) 207 8 '\U0001000a' '\U0001000b' '\U0001000c'
    598 (-67%) 283 (-30%) 198 9 '\U0001000a' '\U0001000b' '\U0001000c'
    581 (-67%) 258 (-27%) 189 10 '\U0001000a' '\U0001000b' '\U0001000c'
    511 (-71%) 152 (-3%) 148 20 '\U0001000a' '\U0001000b' '\U0001000c'
    472 (-76%) 89 (+28%) 114 50 '\U0001000a' '\U0001000b' '\U0001000c'
    461 (-78%) 68 (+47%) 100 100 '\U0001000a' '\U0001000b' '\U0001000c'
    452 (-81%) 48 (+81%) 87 1000 '\U0001000a' '\U0001000b' '\U0001000c'
    452 (-81%) 46 (+85%) 85 10000 '\U0001000a' '\U0001000b' '\U0001000c'

    @pitrou
    Copy link
    Member

    pitrou commented Dec 30, 2012

    64-bit windows results:

    3.3 patched

    925 (-90%) 97 1 'a' 'b' 'c'
    881 (-54%) 405 2 'a' 'b' 'c'
    623 (-51%) 308 3 'a' 'b' 'c'
    482 (-48%) 252 4 'a' 'b' 'c'
    396 (-44%) 223 5 'a' 'b' 'c'
    344 (-40%) 208 6 'a' 'b' 'c'
    306 (-38%) 190 7 'a' 'b' 'c'
    276 (-37%) 173 8 'a' 'b' 'c'
    254 (-36%) 162 9 'a' 'b' 'c'
    241 (-35%) 156 10 'a' 'b' 'c'
    158 (-24%) 120 20 'a' 'b' 'c'
    107 (-12%) 94 50 'a' 'b' 'c'
    87 (+7%) 93 100 'a' 'b' 'c'
    70 (+3%) 72 1000 'a' 'b' 'c'
    63 (+8%) 68 10000 'a' 'b' 'c'

    1332 (-92%) 106 1 '\u010a' '\u010b' '\u010c'
    1137 (-60%) 459 2 '\u010a' '\u010b' '\u010c'
    836 (-58%) 347 3 '\u010a' '\u010b' '\u010c'
    660 (-56%) 288 4 '\u010a' '\u010b' '\u010c'
    567 (-55%) 256 5 '\u010a' '\u010b' '\u010c'
    503 (-51%) 245 6 '\u010a' '\u010b' '\u010c'
    455 (-47%) 242 7 '\u010a' '\u010b' '\u010c'
    414 (-44%) 231 8 '\u010a' '\u010b' '\u010c'
    387 (-41%) 227 9 '\u010a' '\u010b' '\u010c'
    365 (-35%) 237 10 '\u010a' '\u010b' '\u010c'
    256 (-21%) 201 20 '\u010a' '\u010b' '\u010c'
    185 (-9%) 168 50 '\u010a' '\u010b' '\u010c'
    186 (-19%) 150 100 '\u010a' '\u010b' '\u010c'
    137 (-1%) 136 1000 '\u010a' '\u010b' '\u010c'
    131 (+3%) 135 10000 '\u010a' '\u010b' '\u010c'

    1346 (-88%) 160 1 '\U0001000a' '\U0001000b' '\U0001000c'
    1247 (-62%) 469 2 '\U0001000a' '\U0001000b' '\U0001000c'
    965 (-64%) 352 3 '\U0001000a' '\U0001000b' '\U0001000c'
    845 (-64%) 303 4 '\U0001000a' '\U0001000b' '\U0001000c'
    720 (-65%) 251 5 '\U0001000a' '\U0001000b' '\U0001000c'
    655 (-65%) 230 6 '\U0001000a' '\U0001000b' '\U0001000c'
    604 (-58%) 256 7 '\U0001000a' '\U0001000b' '\U0001000c'
    570 (-62%) 214 8 '\U0001000a' '\U0001000b' '\U0001000c'
    546 (-63%) 203 9 '\U0001000a' '\U0001000b' '\U0001000c'
    515 (-63%) 190 10 '\U0001000a' '\U0001000b' '\U0001000c'
    404 (-61%) 157 20 '\U0001000a' '\U0001000b' '\U0001000c'
    339 (-62%) 130 50 '\U0001000a' '\U0001000b' '\U0001000c'
    308 (-60%) 122 100 '\U0001000a' '\U0001000b' '\U0001000c'
    284 (-54%) 130 1000 '\U0001000a' '\U0001000b' '\U0001000c'
    281 (-60%) 113 10000 '\U0001000a' '\U0001000b' '\U0001000c'

    @serhiy-storchaka
    Copy link
    Member

    As __ap__ says, it would be nice to have a comment.

    Oh, I thought I had already done this.

    @vstinner
    Copy link
    Member

    str_replace_1char.patch: why not implementing replace_1char_inplace() in stringlib, with one version per character type (UCS1, UCS2, UCS4)?

    I prefer unicode_2.patch algorithm because it's simpler: only one loop (vs two loops for str_replace_1char.patch, with a threshold of 10 different characters).

    Why do you changed your algorithm? Is str_replace_1char.patch algorithm more efficient than unicode_2.patch algorithm? Is the speedup really interesting?

    @serhiy-storchaka
    Copy link
    Member

    str_replace_1char.patch: why not implementing replace_1char_inplace() in
    stringlib, with one version per character type (UCS1, UCS2, UCS4)?

    Because there are no benefits to do it. All three versions (UCS1, UCS2, and
    UCS4) have no any common code. The best implementation used for every kind of
    strings. For UCS1 it uses fast memchr() (findchar() has some overhead here),
    for UCS2 it uses findchar(), and for UCS4 it uses a dumb loop, because
    findchar() will be too ineffective here.

    I prefer unicode_2.patch algorithm because it's simpler: only one loop (vs
    two loops for str_replace_1char.patch, with a threshold of 10 different
    characters).

    Yes, UCS1-implementation in str_replace_1char.patch is more complicated, but
    it is faster for more input strings. memchr() is more effective than a simple
    loop when the replaceable characters are rare. But when they meet often, a
    simple cycle is more efficient. The "attempts" counter determines how many
    characters will be checked before using memchr(). This speeds up the
    replacement in strings with frequent replacements, but a little slow down the
    replacement in strings with rare replacements. 10 is a compromise.
    str_replace_1char.patch speed up not only case when *each* character replaced,
    but when 1/2, 1/3, 1/5,... characters replaced.

    Why do you changed your algorithm? Is str_replace_1char.patch algorithm
    more efficient than unicode_2.patch algorithm? Is the speedup really
    interesting?

    You can run benchmarks and compare results. str_replace_1char.patch provides
    not the best performance, but most stable results for wide sort of strings,
    and has no regressions comparing with 3.2.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 2, 2013

    How can we move this issu forward? I still prefer unicode_2.patch over str_replace_1char.patch because the code is simpler and so easier to maintain.

    str_replace_1char.patch has a "bug": replace_1char() does not use "pos" for the latin1 path.

    @terryjreedy
    Copy link
    Member

    My experiments last September, before this was filed, showed that str.find (index) had most of the relative slowdown of str.replace. I assumed at that time that .replace used .find or .index to find substrings to replace, so that the fix for .replace would include speeding up .find/index. Looking at the patches, I see that I was wrong; I guess because .replace copies as it goes. I will open a separate issue sometime unless .find gets fixed here.

    For both .find and .replace, the regression was worse on Windows than on linux, so the patches should be tested on Windows also. If I can get vc++ express 2008 installed and working on my current substitute machine. I will give them a try.

    @serhiy-storchaka
    Copy link
    Member

    Here is an updated patch. Some comments added (I will be grateful for help in the improvement of these comments), an implementation moved to stringlib (a new file Objects/stringlib/replace.h added).

    unicode_2.patch optimizes only too special case and I consider this is not worth the effort. str_replace_1char*.patch cover a wider area and designed to be faster than 3.2 and 3.3 in most cases and not to be significant slower in corner cases.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 7, 2013

    str_replace_1char_2.patch looks good to me. Just one nit: please add a reference to this issue in the comment (in replace.h).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 13, 2013

    New changeset d396e0716bf4 by Serhiy Storchaka in branch 'default':
    Issue bpo-16061: Speed up str.replace() for replacing 1-character strings.
    http://hg.python.org/cpython/rev/d396e0716bf4

    @serhiy-storchaka
    Copy link
    Member

    Thanks to Ezio Melotti and Daniel Shahaf for their great help in correcting my clumsy wording.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-unicode
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants