Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str.translate is absurdly slow in majority of use cases (takes up to 60x longer than similar functions) #65317

Closed
MojoVampire mannequin opened this issue Mar 31, 2014 · 31 comments
Labels
performance Performance or resource usage topic-unicode

Comments

@MojoVampire
Copy link
Mannequin

MojoVampire mannequin commented Mar 31, 2014

BPO 21118
Nosy @pitrou, @vstinner, @ezio-melotti, @serhiy-storchaka, @MojoVampire
Files
  • translate_timing.py: File to time various ways of using str.translate and its "competing" functions
  • translate_writer.patch
  • translate_script.py
  • translate_writer_bench.txt
  • fast_translate.patch
  • translate_script_ascii.py
  • translate_cached.patch
  • translate_cached_2.patch
  • bench_translate.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-04-05.13:03:50.934>
    created_at = <Date 2014-03-31.23:09:06.297>
    labels = ['expert-unicode', 'performance']
    title = 'str.translate is absurdly slow in majority of use cases (takes up to 60x longer than similar functions)'
    updated_at = <Date 2014-04-08.07:14:31.298>
    user = 'https://github.com/MojoVampire'

    bugs.python.org fields:

    activity = <Date 2014-04-08.07:14:31.298>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-04-05.13:03:50.934>
    closer = 'vstinner'
    components = ['Unicode']
    creation = <Date 2014-03-31.23:09:06.297>
    creator = 'josh.r'
    dependencies = []
    files = ['34688', '34691', '34727', '34728', '34731', '34732', '34734', '34736', '34744']
    hgrepos = []
    issue_num = 21118
    keywords = ['patch']
    message_count = 31.0
    messages = ['215283', '215301', '215339', '215387', '215473', '215538', '215539', '215540', '215541', '215551', '215552', '215553', '215576', '215577', '215586', '215587', '215594', '215598', '215599', '215600', '215602', '215606', '215610', '215612', '215673', '215676', '215678', '215680', '215685', '215686', '215742']
    nosy_count = 6.0
    nosy_names = ['pitrou', 'vstinner', 'ezio.melotti', 'python-dev', 'serhiy.storchaka', 'josh.r']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue21118'
    versions = ['Python 3.4', 'Python 3.5']

    @MojoVampire
    Copy link
    Mannequin Author

    MojoVampire mannequin commented Mar 31, 2014

    String translation functions, at least in other languages, are typically a performance optimization when you *really* need the speed. In exchange for paying the memory cost and one-time work to create a translation table, you get much faster 1 to 1 and deletion of characters.

    In Python 3, str.translate is usually a performance pessimization, not optimization. While it does a few things similar functions (e.g. str.replace) can't do, like perform a translation where the inputs and outputs overlap ('abc' -> 'bca' wouldn't work with str.replace), and it handles the full Unicode range ('\u0f01\u0f02' -> '\u0f03\u0f04' can't be done by encoding, using bytes.translate, then decoding again), when another approach is available, str.translate is almost always the slowest.

    While it may not be practical to make the general case for str.translate match the performance of bytes.translate, it should at least be possible to improve upon it in many common cases.

    My timing data (see attached file for how it was produced):

        $ python.exe translate_timing.py
    Testing 1-1 translation
    bytes.translate 0.05638575777521804
    str.translate 2.9639258423152874
    str.translate from bytes trans 1.6307581351818357
    str.encode->bytes.translate->bytes.decode 0.09472254504689914
    str.replace 0.2507745649580393
    
    Testing deletion
    bytes.translate 0.05367317344084199
    str.translate 2.360142494240577
    str.encode->bytes.translate->bytes.decode 0.0629345277274993
    str.replace 0.31173443413690904
    
    Testing enlarging translations
    str.translate 2.33631417640283
    str.replace 0.41248187325160757
    

    Obviously, this is somewhat contrived, but it illustrates my point that the common case (ASCII or latin-1 strings) are far too slow. In every case, str.translate loses to every competitor, including itself when you pass it a translation table produced by bytes.translate, and including the case where a pure Python function calls str.replace over and over.

    A large part of the problem is clearly the use of a dict to perform lookups; as seen in the 1-1 test case, str.translate using the result of str.maketrans takes nearly twice as long as when it uses the result of a similar bytes.maketrans. While I have not profiled the code, I suspect the remainder is a combination of the overhead involving conversion of Py_UCS4 to PyLong for general purpose lookup, handling the multiple ways the translation target can be expressed (a Unicode ordinal or a str object), and the code that checks for and resizes the output buffer each time.

    I have ideas for a solution involving a special purpose translation object to be produced by str.maketrans for cases where the translation table will produce an equal or smaller string (all to values are None or length 1 or lower strings), and the from values all fall in a small enough range (say, 1024) to avoid excessive memory usage from large translation table objects. But I'd like to know whether such a change (including modifying the behavior of str.maketrans to produce an unspecified mapping object, instead of the current dict) is desired, acceptable, what have you.

    @MojoVampire MojoVampire mannequin added topic-unicode performance Performance or resource usage labels Mar 31, 2014
    @MojoVampire MojoVampire mannequin changed the title str.translate is absurdly slow in majority of use cases (takes 3x to 10x longer than similar functions) str.translate is absurdly slow in majority of use cases (takes up to 60x longer than similar functions) Mar 31, 2014
    @vstinner
    Copy link
    Member

    vstinner commented Apr 1, 2014

    str.translate() currently allocates a buffer of UCS4 characters.

    translate_writer.patch:

    • modify _PyUnicode_TranslateCharmap() to use the _PyUnicodeWriter API
    • drop optimizations for error handlers different than "ignore" because there is no unit tests for them, and str.translate() uses "ignore". It's safer to drop untested optimization.
    • cleanup also the code: charmaptranslate_output() is now responsible to handle charmaptranslate_lookup() result (to decrement the reference coutner)

    str.translate() may be a little bit faster when translating ASCII to ASCII for large string, but not so much.

    bytes.translate() is much faster because it builds a C array of 256 items to fast table lookup, whereas str.translate() requires a Python dict lookup for each character, which is much slower.

    codecs.charmap_build() (PyUnicode_BuildEncodingMap()) creates a C array ("a three-level trie") for fast lookup. It is used with codecs.charmap_encode() for 8-bit encodings. We may reuse it for simple cases, like translating ASCII to ASCII.

    @MojoVampire
    Copy link
    Mannequin Author

    MojoVampire mannequin commented Apr 1, 2014

    @Haypo: Are you planning to run with this then? Or shall I pick up where your patch leaves off?

    Thanks for the pointer to the codecs charmap_build; it's not documented anywhere that I can tell, so I didn't even know it existed. Now to dig into the source to figure out what it does and how to use it (both in C and Python).

    I know about the fundamental design differences that makes bytes.translate kick str.translate's butt; my plan to make a special purpose mapping-like object that didn't require C layer code to wrap up Py_UCS4 in PyLong, or pull Py_UCS4 out of PyUnicode/PyLong objects was largely to try and get a bytes.translate like implementation, but if
    codecs.charmap_build()/PyUnicode_BuildEncodingMap() can get me 90% of the way there without reinventing the wheel, it's clearly worth looking at.

    Again, let me know if it's okay if I try to finish off the work you began, or if you'd prefer to do it yourself.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 2, 2014

    bytes.translate() is much faster because it builds a C array of 256
    items to fast table lookup, whereas str.translate() requires a Python
    dict lookup for each character, which is much slower.

    It should be easy to devise a simple hash table for the common case of one-to-one translation (and also deletion).

    @MojoVampire
    Copy link
    Mannequin Author

    MojoVampire mannequin commented Apr 3, 2014

    Hmm... Maybe I'm testing it wrong, but I'm finding your patch is slowing down translation by a small amount on my test case; not a lot, maybe a 10% slowdown on enlarging translations, 6% for 1-1, and deletion in between.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 4, 2014

    translate_script.py: microbenchmark for str.translate() with various length, charsets (ASCII, latin1, UCS2, UCS4) and percentage of replacement. Result:

    ---------------------------+----------+---------
    Summary                    | original |   writer
    ---------------------------+------+---------
    replace none, length=10    |  5.65 us |  5.67 us
    replace none, length=10**3 |   489 us |   486 us
    replace none, length=10**6 |   487 ms |   484 ms
    replace 10%, length=10     |  5.47 us |  5.48 us
    replace 10%, length=10**3  |   455 us |   455 us
    replace 10%, length=10**6  |   455 ms |   455 ms
    replace 50%, length=10     |  4.71 us |   4.7 us
    replace 50%, length=10**3  |   372 us |   374 us
    replace 50%, length=10**6  |   371 ms |   373 ms
    replace 90%, length=10     |  3.88 us |  3.93 us
    replace 90%, length=10**3  |   293 us |   297 us
    replace 90%, length=10**6  |   293 ms |   295 ms
    replace all, length=10     |  3.58 us |  3.59 us
    replace all, length=10**3  |   263 us |   262 us
    replace all, length=10**6  |   262 ms |   261 ms
    ---------------------------+----------+---------
    Total                      | 1.87 sec | 1.87 sec
    ---------------------------+----------+---------

    It's exactly the same with translate_writer.patch.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 4, 2014

    translate_writer_bench.txt: full output of the micro-benchmark.

    @serhiy-storchaka
    Copy link
    Member

    • drop optimizations for error handlers different than "ignore" because there is no unit tests for them, and str.translate() uses "ignore". It's safer to drop untested optimization.

    It is unsafe to do such nontrivial modifications for untested code.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 4, 2014

    Oh, I tested the wrong patch :-( Here is the updated benchmark summary:

    ---------------------------+--------------+---------------
    Summary | original | writer
    ---------------------------+--------------+---------------
    replace none, length=10 | 5.65 us () | 5.53 us
    replace none, length=10**3 | 489 us (
    ) | 475 us
    replace none, length=10**6 | 487 ms () | 476 ms
    replace 10%, length=10 | 5.47 us (
    ) | 5.25 us
    replace 10%, length=10**3 | 455 us () | 447 us
    replace 10%, length=10**6 | 455 ms (
    ) | 447 ms
    replace 50%, length=10 | 4.71 us () | 4.42 us (-6%)
    replace 50%, length=10**3 | 372 us (
    ) | 352 us (-5%)
    replace 50%, length=10**6 | 371 ms () | 352 ms (-5%)
    replace 90%, length=10 | 3.88 us (
    ) | 3.53 us (-9%)
    replace 90%, length=10**3 | 293 us () | 270 us (-8%)
    replace 90%, length=10**6 | 293 ms (
    ) | 271 ms (-7%)
    replace all, length=10 | 3.58 us () | 3.24 us (-10%)
    replace all, length=10**3 | 263 us (
    ) | 236 us (-10%)
    replace all, length=10**6 | 262 ms () | 235 ms (-10%)
    ---------------------------+--------------+---------------
    Total | 1.87 sec (
    ) | 1.78 sec
    ---------------------------+--------------+---------------

    str.translate() is a little bit faster with translate_writer.patch, but I don't really care if it's not faster, it's just to cleanup to code.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 4, 2014

    New changeset 95d4e8124c6a by Victor Stinner in branch 'default':
    Issue bpo-21118: Fix _PyUnicodeTranslateError_Create(), add missing format
    http://hg.python.org/cpython/rev/95d4e8124c6a

    New changeset 03b1dd29b327 by Victor Stinner in branch 'default':
    Issue bpo-21118: Use _PyUnicodeWriter API in str.translate() to simplify and
    http://hg.python.org/cpython/rev/03b1dd29b327

    @vstinner
    Copy link
    Member

    vstinner commented Apr 4, 2014

    New changeset 95d4e8124c6a by Victor Stinner in branch 'default':
    Issue bpo-21118: Fix _PyUnicodeTranslateError_Create(), add missing format
    http://hg.python.org/cpython/rev/95d4e8124c6a

    It looks like _PyUnicode_TranslateCharmap() didn't work with error handler different than "strict", "replace", "ignore" and "xmlcharrefreplace": UnicodeTranslateError constructor crashed before my change.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 4, 2014

    New changeset 70990e795657 by Victor Stinner in branch '3.4':
    Issue bpo-21118: Fix _PyUnicodeTranslateError_Create(), add missing format
    http://hg.python.org/cpython/rev/70990e795657

    @vstinner
    Copy link
    Member

    vstinner commented Apr 4, 2014

    fast_translate.patch: here is the real patch which optimize translating ASCII to ASCII. It's *much* faster:

    ---------------------------+-------------+---------------
    Tests                      |     writerA |          fastA
    ---------------------------+-------------+---------------
    replace none, length=10    |  710 ns () |  169 ns (-76%)
    replace none, length=10**3 | 59.1 us (
    ) |  997 ns (-98%)
    replace none, length=10**6 | 59.2 ms () |  805 us (-99%)
    replace 10%, length=10     |  678 ns (
    ) |  187 ns (-72%)
    replace 10%, length=10**3  | 58.7 us () | 1.02 us (-98%)
    replace 10%, length=10**6  | 57.5 ms (
    ) |  817 us (-99%)
    replace 50%, length=10     |  559 ns () |  188 ns (-66%)
    replace 50%, length=10**3  | 43.6 us (
    ) | 1.02 us (-98%)
    replace 50%, length=10**6  | 43.4 ms () |  811 us (-98%)
    replace 90%, length=10     |  437 ns (
    ) |  187 ns (-57%)
    replace 90%, length=10**3  | 32.4 us () | 1.02 us (-97%)
    replace 90%, length=10**6  | 31.8 ms (
    ) |  805 us (-97%)
    replace all, length=10     |  386 ns () |  121 ns (-69%)
    replace all, length=10**3  | 27.3 us (
    ) |  955 ns (-96%)
    replace all, length=10**6  | 27.2 ms () |  807 us (-97%)
    ---------------------------+-------------+---------------
    Total                      |  219 ms (
    ) | 4.05 ms (-98%)
    ---------------------------+-------------+---------------

    I'm not sure yet that the patch doesn't make non-ASCII cases slower.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 4, 2014

    translate_script_ascii.py: the benchmark script used to test fast_translate.patch.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 5, 2014

    New changeset 9acc8196a82c by Victor Stinner in branch 'default':
    Issue bpo-21118: Remove unused variable
    http://hg.python.org/cpython/rev/9acc8196a82c

    New changeset bd594dd71d46 by Victor Stinner in branch 'default':
    Issue bpo-21118: Add more unit tests on str.translate()
    http://hg.python.org/cpython/rev/bd594dd71d46

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 5, 2014

    New changeset cca6b056236a by Victor Stinner in branch 'default':
    Issue bpo-21118: Optimize str.translate() for ASCII => ASCII translation
    http://hg.python.org/cpython/rev/cca6b056236a

    New changeset 6a347c0ffbfc by Victor Stinner in branch 'default':
    Issue bpo-21118: Add unit test for invalid character replacement (code point higher than U+10ffff)
    http://hg.python.org/cpython/rev/6a347c0ffbfc

    @serhiy-storchaka
    Copy link
    Member

    fast_translate.patch works only with ASCII input string and ASCII 1:1 mapping. Is this actually typical case?

    Here is a patch which uses different approach. It caches values for ASCII keys. It works with all types of input strings and mappings and can speed up more use cases, including non-ASCII data, deletion and enlarging.

    translate_timing.py results:

                                unpatched           patched
    

    Testing 1-1 translation
    str.translate 4.55125927699919 0.7898181750006188
    str.translate from bytes trans 1.8910855210015143 0.779950579000797
    Testing deletion
    str.translate 4.481863372000589 0.7718261509999138
    Testing enlarging translations
    str.translate 4.421521270000085 0.9290620680003485

    translate_script_ascii.py results:

    ---------------------------+---------------------------+-------------------------------
    Tests                      | translate_script_ascii.34 | translate_script_ascii.cached3
    ---------------------------+---------------------------+-------------------------------
    replace none, length=10    |           6.12 us (+176%) |                    2.22 us ()
    replace none, length=10**3 |           448 us (+1293%) |                    32.2 us (
    )
    replace none, length=10**6 |           474 ms (+1435%) |                    30.9 ms ()
    replace 10%, length=10     |           5.73 us (+133%) |                    2.46 us (
    )
    replace 10%, length=10**3  |           412 us (+1060%) |                    35.5 us ()
    replace 10%, length=10**6  |           442 ms (+1204%) |                    33.9 ms (
    )
    replace 50%, length=10     |            4.75 us (+85%) |                    2.57 us ()
    replace 50%, length=10**3  |            311 us (+552%) |                    47.7 us (
    )
    replace 50%, length=10**6  |            331 ms (+617%) |                    46.2 ms ()
    replace 90%, length=10     |            3.36 us (+29%) |                    2.59 us (
    )
    replace 90%, length=10**3  |            178 us (+250%) |                    50.8 us ()
    replace 90%, length=10**6  |            192 ms (+291%) |                    49.2 ms (
    )
    replace all, length=10     |            2.64 us (+28%) |                    2.06 us ()
    replace all, length=10**3  |            146 us (+189%) |                    50.3 us (
    )
    replace all, length=10**6  |            152 ms (+194%) |                    51.7 ms ()
    ---------------------------+---------------------------+-------------------------------
    Total                      |          1.59 sec (+651%) |                     212 ms (
    )
    ---------------------------+---------------------------+-------------------------------

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 5, 2014

    New changeset 47b0c076e17d by Victor Stinner in branch 'default':
    Issue bpo-21118: Optimize also str.translate() for ASCII => ASCII deletion
    http://hg.python.org/cpython/rev/47b0c076e17d

    @vstinner
    Copy link
    Member

    vstinner commented Apr 5, 2014

    Serhiy wrote:

    fast_translate.patch works only with ASCII input string and ASCII 1:1 mapping. Is this actually typical case?

    I just checked the Python stdlib: as expected, all usages of
    str.translate() except of email.quoprimime use ASCII 1:1. My
    optimization is only used if the input string is ASCII, but I expect
    that most strings are just ASCI.

    ** distutils: ASCII => ASCII (1:1)

       longopt_xlate = str.maketrans('-', '_')
    and
       WS_TRANS = {ord(_wschar) : ' ' for _wschar in string.whitespace};
    .. text = text.translate(WS_TRANS)

    ** email.quoprimes:

        encoded = header_bytes.decode('latin1').translate(_QUOPRI_HEADER_MAP)
    and
        body = body.translate(_QUOPRI_BODY_ENCODE_MAP)

    => my optimization is used if the input string contains "safe header
    characters" (a-z, A-Z, 0-9, space and "-!*+/"). It should be the
    common case for emails.

    ** rot13 encoding: ASCII 1:1

    ** idlelib.PyParse: ASCII 1:1

            str = str.translate(_tran)

    ** textwrap: ASCII 1:1

                text = text.translate(self.unicode_whitespace_trans)

    ** zipfile: ASCII 1:1

            arcname = arcname.translate(table)

    @serhiy-storchaka
    Copy link
    Member

    Previous patch and results were against source code before committing fast_translate.patch. Here is a patch synchronized with current code.

    translate_timing.py results:

                                unpatched           patched
    

    Testing 1-1 translation
    str.translate 0.5265426559999469 0.6120695240006171
    str.translate from bytes trans 0.2608634099997289 0.32327288200031035
    Testing deletion
    str.translate 4.331346814999051 0.7960810519998631
    Testing enlarging translations
    str.translate 4.392650978999882 0.9280614529998275

    translate_script_ascii.py results:

    ---------------------------+------------------------------+-------------------------------
    Tests                      | translate_script_ascii.fastA | translate_script_ascii.cached2
    ---------------------------+------------------------------+-------------------------------
    replace none, length=10    |                  1.54 us () |                 2.07 us (+34%)
    replace none, length=10**3 |                  10.5 us (
    ) |                        10.6 us
    replace none, length=10**6 |                      12.6 ms |                    12.3 ms ()
    replace 10%, length=10     |                  1.69 us (
    ) |                 2.31 us (+37%)
    replace 10%, length=10**3  |                  10.6 us () |                        10.7 us
    replace 10%, length=10**6  |                      12.6 ms |                    12.1 ms (
    )
    replace 50%, length=10     |                  1.69 us () |                 2.31 us (+36%)
    replace 50%, length=10**3  |                  10.6 us (
    ) |                        10.9 us
    replace 50%, length=10**6  |                      12.6 ms |                    12.3 ms ()
    replace 90%, length=10     |                  1.69 us (
    ) |                 2.26 us (+34%)
    replace 90%, length=10**3  |                  10.6 us () |                        10.7 us
    replace 90%, length=10**6  |                      12.6 ms |                    12.2 ms (
    )
    replace all, length=10     |                  1.07 us () |                 1.68 us (+56%)
    replace all, length=10**3  |                  9.28 us (
    ) |                        9.46 us
    replace all, length=10**6  |                      12.6 ms |                      12 ms ()
    ---------------------------+------------------------------+-------------------------------
    Total                      |                        63 ms |                    60.9 ms (
    )
    ---------------------------+------------------------------+-------------------------------

    translate_cached_2.patch little slows down translation of short ASCII strings (due to cache initialization and freeing), but speeds up deletion and enlarging and works with non-ASCII data.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 5, 2014

    bench_translate.py: benchmark ASCII 1:1 but also ASCII 1:1 with deletion. Results of the benchmark comparing tip (47b0c076e17d which includes my latest optimization on deletion) and 6a347c0ffbfc + translate_cached_2.patch.

    Common platform:
    Python unicode implementation: PEP-393
    CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
    Platform: Linux-3.12.8-300.fc20.x86_64-x86_64-with-fedora-20-Heisenbug
    Bits: int=32, long=64, long long=64, size_t=64, void*=64
    CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
    Timer: time.perf_counter
    Timer precision: 45 ns
    Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)

    Platform of campaign remove:
    SCM: hg revision=47b0c076e17d tag=tip branch=default date="2014-04-05 14:27 +0200"
    Python version: 3.5.0a0 (default:47b0c076e17d, Apr 5 2014, 14:50:53) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]
    Date: 2014-04-05 14:51:55

    Platform of campaign cache:
    SCM: hg revision=6a347c0ffbfc+ branch=default date="2014-04-05 11:56 +0200"
    Python version: 3.5.0a0 (default:6a347c0ffbfc+, Apr 5 2014, 14:53:02) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]
    Date: 2014-04-05 14:53:12

    ---------------------------+-------------+----------------
    Tests | remove | cache
    ---------------------------+-------------+----------------
    replace none, length=10 | 184 ns () | 275 ns (+50%)
    replace none, length=10**3 | 1.06 us (
    ) | 1.1 us
    replace none, length=10**6 | 827 us () | 792 us
    replace 10%, length=10 | 207 ns (
    ) | 298 ns (+44%)
    replace 10%, length=10**3 | 1.08 us () | 1.12 us
    replace 10%, length=10**6 | 828 us (
    ) | 793 us
    replace 50%, length=10 | 205 ns () | 298 ns (+46%)
    replace 50%, length=10**3 | 1.08 us (
    ) | 1.17 us (+7%)
    replace 50%, length=10**6 | 827 us () | 793 us
    replace 90%, length=10 | 208 ns (
    ) | 298 ns (+44%)
    replace 90%, length=10**3 | 1.09 us () | 1.13 us
    replace 90%, length=10**6 | 850 us (
    ) | 793 us (-7%)
    replace all, length=10 | 145 ns () | 226 ns (+56%)
    replace all, length=10**3 | 1.03 us (
    ) | 1.04 us
    replace all, length=10**6 | 827 us () | 792 us
    remove none, length=10 | 184 ns (
    ) | 274 ns (+49%)
    remove none, length=10**3 | 1.07 us () | 1.09 us
    remove none, length=10**6 | 836 us (
    ) | 793 us (-5%)
    remove 10%, length=10 | 223 ns () | 408 ns (+83%)
    remove 10%, length=10**3 | 1.45 us (
    ) | 9.13 us (+531%)
    remove 10%, length=10**6 | 1.08 ms () | 8.73 ms (+706%)
    remove 50%, length=10 | 221 ns (
    ) | 407 ns (+84%)
    remove 50%, length=10**3 | 1.23 us () | 8.28 us (+575%)
    remove 50%, length=10**6 | 948 us (
    ) | 7.9 ms (+734%)
    remove 90%, length=10 | 230 ns () | 375 ns (+63%)
    remove 90%, length=10**3 | 1.57 us (
    ) | 3.86 us (+145%)
    remove 90%, length=10**6 | 1.28 ms () | 3.49 ms (+173%)
    remove all, length=10 | 139 ns (
    ) | 266 ns (+92%)
    remove all, length=10**3 | 1.24 us () | 2.46 us (+99%)
    remove all, length=10**6 | 1.07 ms (
    ) | 2.13 ms (+100%)
    ---------------------------+-------------+----------------
    Total | 9.38 ms (*) | 27 ms (+188%)
    ---------------------------+-------------+----------------

    You patch is always slower for the common case (ASCII => ASCII translation).

    I implemented the most obvious optimization for the most common case (ASCII 1:1 and ASCII 1:1 with deletion). I consider that the current code is enough to close this issue.

    @josh Rosenberg: Thanks for the report. The current implementation should be almost as fast as bytes.translate() (the "60x" factor you mentionned in the title) for ASCII 1:1 mapping.

    --

    Serhiy: If you are interested to optimize str.translate() for the general case (larger charset), please open a new issue. It will probably require more complex "cache". You may take a look at charmap codec which has such more complex cache (cache with 3 levels), see my message msg215301.

    IMO it's not interesting to invest time on optimizing str.translate(), it's not a common function. It took some years before an user run a benchmark on it :-)

    @vstinner vstinner closed this as completed Apr 5, 2014
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 5, 2014

    New changeset e56c71c6b45e by Victor Stinner in branch 'default':
    Issue bpo-21118: str.translate() now raises a ValueError, not a TypeError, if the
    http://hg.python.org/cpython/rev/e56c71c6b45e

    @serhiy-storchaka
    Copy link
    Member

    субота, 05-кві-2014 12:49:22 ви написали:

    STINNER Victor added the comment:

    Serhiy wrote:
    > fast_translate.patch works only with ASCII input string and ASCII 1:1
    > mapping. Is this actually typical case?
    I just checked the Python stdlib: as expected, all usages of
    str.translate() except of email.quoprimime use ASCII 1:1.

    Because str.translate() is much slower than a series of str.replace() (which
    already is optimized), some usages of str.translate() was rewritten to use
    str.replace(). See for example html.escape(). This is about what this issue.

    My
    optimization is only used if the input string is ASCII, but I expect
    that most strings are just ASCI.

    In most (if not all) these cases input string can be non-ASCII.

    bench_translate.py: benchmark ASCII 1:1 but also ASCII 1:1 with deletion.

    Could you please provide bench_translate.py?

    It will probably require more complex "cache". You may take a look at
    charmap codec which has such more complex cache (cache with 3 levels), see
    my message msg215301.

    I were going to do this on next step. Full cache can grow up to 1114112
    characters, so I planned to cache only BMP characters (cache with 2 levels).

    You commit too fast, I am late for you. ;)

    @serhiy-storchaka
    Copy link
    Member

    Do you plan to backport bug fixes to 3.3 and 2.7?

    @MojoVampire
    Copy link
    Mannequin Author

    MojoVampire mannequin commented Apr 7, 2014

    Thanks for addressing this so fast. Annoyingly, I suspect it will not help the original case that led me to finding the slowdown (I had some code that was translating from 56 latin-1 Romance characters with diacritics to the equivalent ASCII characters, so it was 1-1, but it was always mapping from non-ASCII to ASCII, and therefore won't benefit from a change that only caches code points 0-127).

    That said, every *other* time I've used str.translate has been in an ASCII->ASCII scenario, where this *will* help. So again, thank you.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 7, 2014

    Could you please provide bench_translate.py?

    Oh, I forgot to add it :-(

    @vstinner
    Copy link
    Member

    vstinner commented Apr 7, 2014

    Do you plan to backport bug fixes to 3.3 and 2.7?

    Which commit?

    3.3 is no more open to bug fixes, only to security fixes.

    New changeset 95d4e8124c6a by Victor Stinner in branch 'default':
    Issue bpo-21118: Fix _PyUnicodeTranslateError_Create(), add missing format
    http://hg.python.org/cpython/rev/95d4e8124c6a

    Python 2.7 is not affected by this issue.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 7, 2014

    I opened the issue bpo-21165 to discuss a more generic optimization of str.translate().

    @serhiy-storchaka
    Copy link
    Member

    > Do you plan to backport bug fixes to 3.3 and 2.7?
    Which commit?

    A fix of _PyUnicode_TranslateCharmap().

    3.3 is no more open to bug fixes, only to security fixes.

    Oh, I meant 3.4.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 7, 2014

    A fix of _PyUnicode_TranslateCharmap().

    Sorry, I don't see which commit is a bugfix and is not backported to Python 3.4 yet.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 8, 2014

    New changeset cb632988bc09 by Victor Stinner in branch 'default':
    Issue bpo-21118: PyLong_AS_LONG() result type is long
    http://hg.python.org/cpython/rev/cb632988bc09

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage topic-unicode
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants