classification
Title: str.translate is absurdly slow in majority of use cases (takes up to 60x longer than similar functions)
Type: performance Stage: patch review
Components: Unicode Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, josh.r, pitrou, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2014-03-31 23:09 by josh.r, last changed 2014-04-08 07:14 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
translate_timing.py josh.r, 2014-03-31 23:09 File to time various ways of using str.translate and its "competing" functions
translate_writer.patch vstinner, 2014-04-01 08:23 review
translate_script.py vstinner, 2014-04-04 17:29
translate_writer_bench.txt vstinner, 2014-04-04 17:30
fast_translate.patch vstinner, 2014-04-04 23:38
translate_script_ascii.py vstinner, 2014-04-04 23:43
translate_cached.patch serhiy.storchaka, 2014-04-05 11:47 review
translate_cached_2.patch serhiy.storchaka, 2014-04-05 12:51 review
bench_translate.py vstinner, 2014-04-07 08:34
Messages (31)
msg215283 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2014-03-31 23:09
String translation functions, at least in other languages, are typically a performance optimization when you *really* need the speed. In exchange for paying the memory cost and one-time work to create a translation table, you get much faster 1 to 1 and deletion of characters.

In Python 3, str.translate is usually a performance pessimization, not optimization. While it does a few things similar functions (e.g. str.replace) can't do, like perform a translation where the inputs and outputs overlap ('abc' -> 'bca' wouldn't work with str.replace), and it handles the full Unicode range ('\u0f01\u0f02' -> '\u0f03\u0f04' can't be done by encoding, using bytes.translate, then decoding again), when another approach is available, str.translate is almost always the slowest.

While it may not be practical to make the general case for str.translate match the performance of bytes.translate, it should at least be possible to improve upon it in many common cases.

My timing data (see attached file for how it was produced):

    $ python.exe translate_timing.py

    Testing 1-1 translation
    bytes.translate 0.05638575777521804
    str.translate 2.9639258423152874
    str.translate from bytes trans 1.6307581351818357
    str.encode->bytes.translate->bytes.decode 0.09472254504689914
    str.replace 0.2507745649580393

    Testing deletion
    bytes.translate 0.05367317344084199
    str.translate 2.360142494240577
    str.encode->bytes.translate->bytes.decode 0.0629345277274993
    str.replace 0.31173443413690904

    Testing enlarging translations
    str.translate 2.33631417640283
    str.replace 0.41248187325160757

Obviously, this is somewhat contrived, but it illustrates my point that the common case (ASCII or latin-1 strings) are far too slow. In every case, str.translate loses to every competitor, including itself when you pass it a translation table produced by bytes.translate, and including the case where a pure Python function calls str.replace over and over.

A large part of the problem is clearly the use of a dict to perform lookups; as seen in the 1-1 test case, str.translate using the result of str.maketrans takes nearly twice as long as when it uses the result of a similar bytes.maketrans. While I have not profiled the code, I suspect the remainder is a combination of the overhead involving conversion of Py_UCS4 to PyLong for general purpose lookup, handling the multiple ways the translation target can be expressed (a Unicode ordinal or a str object), and the code that checks for and resizes the output buffer each time.

I have ideas for a solution involving a special purpose translation object to be produced by str.maketrans for cases where the translation table will produce an equal or smaller string (all to values are None or length 1 or lower strings), and the from values all fall in a small enough range (say, 1024) to avoid excessive memory usage from large translation table objects. But I'd like to know whether such a change (including modifying the behavior of str.maketrans to produce an unspecified mapping object, instead of the current dict) is desired, acceptable, what have you.
msg215301 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-01 08:23
str.translate() currently allocates a buffer of UCS4 characters.

translate_writer.patch:
- modify _PyUnicode_TranslateCharmap() to use the _PyUnicodeWriter API
- drop optimizations for error handlers different than "ignore" because there is no unit tests for them, and str.translate() uses "ignore". It's safer to drop untested optimization.
- cleanup also the code: charmaptranslate_output() is now responsible to handle charmaptranslate_lookup() result (to decrement the reference coutner)

str.translate() may be a little bit faster when translating ASCII to ASCII for large string, but not so much.

bytes.translate() is much faster because it builds a C array of 256 items to fast table lookup, whereas str.translate() requires a Python dict lookup for each character, which is much slower.

codecs.charmap_build() (PyUnicode_BuildEncodingMap()) creates a C array ("a three-level trie") for fast lookup. It is used with codecs.charmap_encode() for 8-bit encodings. We may reuse it for simple cases, like translating ASCII to ASCII.
msg215339 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2014-04-01 21:20
@haypo: Are you planning to run with this then? Or shall I pick up where your patch leaves off?

Thanks for the pointer to the codecs charmap_build; it's not documented anywhere that I can tell, so I didn't even know it existed. Now to dig into the source to figure out what it does and how to use it (both in C and Python).

I know about the fundamental design differences that makes bytes.translate kick str.translate's butt; my plan to make a special purpose mapping-like object that didn't require C layer code to wrap up Py_UCS4 in PyLong, or pull Py_UCS4 out of PyUnicode/PyLong objects was largely to try and get a bytes.translate like implementation, but if 
codecs.charmap_build()/PyUnicode_BuildEncodingMap() can get me 90% of the way there without reinventing the wheel, it's clearly worth looking at.

Again, let me know if it's okay if I try to finish off the work you began, or if you'd prefer to do it yourself.
msg215387 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-02 15:00
> bytes.translate() is much faster because it builds a C array of 256
> items to fast table lookup, whereas str.translate() requires a Python
> dict lookup for each character, which is much slower.

It should be easy to devise a simple hash table for the common case of one-to-one translation (and also deletion).
msg215473 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2014-04-03 21:50
Hmm... Maybe I'm testing it wrong, but I'm finding your patch is slowing down translation by a small amount on my test case; not a lot, maybe a 10% slowdown on enlarging translations, 6% for 1-1, and deletion in between.
msg215538 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-04 17:29
translate_script.py: microbenchmark for str.translate() with various length, charsets (ASCII, latin1, UCS2, UCS4) and percentage of replacement. Result:

---------------------------+----------+---------
Summary                    | original |   writer
---------------------------+------+---------
replace none, length=10    |  5.65 us |  5.67 us
replace none, length=10**3 |   489 us |   486 us
replace none, length=10**6 |   487 ms |   484 ms
replace 10%, length=10     |  5.47 us |  5.48 us
replace 10%, length=10**3  |   455 us |   455 us
replace 10%, length=10**6  |   455 ms |   455 ms
replace 50%, length=10     |  4.71 us |   4.7 us
replace 50%, length=10**3  |   372 us |   374 us
replace 50%, length=10**6  |   371 ms |   373 ms
replace 90%, length=10     |  3.88 us |  3.93 us
replace 90%, length=10**3  |   293 us |   297 us
replace 90%, length=10**6  |   293 ms |   295 ms
replace all, length=10     |  3.58 us |  3.59 us
replace all, length=10**3  |   263 us |   262 us
replace all, length=10**6  |   262 ms |   261 ms
---------------------------+----------+---------
Total                      | 1.87 sec | 1.87 sec
---------------------------+----------+---------

It's exactly the same with translate_writer.patch.
msg215539 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-04 17:30
translate_writer_bench.txt: full output of the micro-benchmark.
msg215540 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-04-04 17:35
> - drop optimizations for error handlers different than "ignore" because there is no unit tests for them, and str.translate() uses "ignore". It's safer to drop untested optimization.

It is unsafe to do such nontrivial modifications for untested code.
msg215541 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-04 17:39
Oh, I tested the wrong patch :-( Here is the updated benchmark summary:

---------------------------+--------------+---------------
Summary                    |     original |         writer
---------------------------+--------------+---------------
replace none, length=10    |  5.65 us (*) |        5.53 us
replace none, length=10**3 |   489 us (*) |         475 us
replace none, length=10**6 |   487 ms (*) |         476 ms
replace 10%, length=10     |  5.47 us (*) |        5.25 us
replace 10%, length=10**3  |   455 us (*) |         447 us
replace 10%, length=10**6  |   455 ms (*) |         447 ms
replace 50%, length=10     |  4.71 us (*) |  4.42 us (-6%)
replace 50%, length=10**3  |   372 us (*) |   352 us (-5%)
replace 50%, length=10**6  |   371 ms (*) |   352 ms (-5%)
replace 90%, length=10     |  3.88 us (*) |  3.53 us (-9%)
replace 90%, length=10**3  |   293 us (*) |   270 us (-8%)
replace 90%, length=10**6  |   293 ms (*) |   271 ms (-7%)
replace all, length=10     |  3.58 us (*) | 3.24 us (-10%)
replace all, length=10**3  |   263 us (*) |  236 us (-10%)
replace all, length=10**6  |   262 ms (*) |  235 ms (-10%)
---------------------------+--------------+---------------
Total                      | 1.87 sec (*) |       1.78 sec
---------------------------+--------------+---------------

str.translate() is a little bit faster with translate_writer.patch, but I don't really care if it's not faster, it's just to cleanup to code.
msg215551 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-04-04 19:02
New changeset 95d4e8124c6a by Victor Stinner in branch 'default':
Issue #21118: Fix _PyUnicodeTranslateError_Create(), add missing format
http://hg.python.org/cpython/rev/95d4e8124c6a

New changeset 03b1dd29b327 by Victor Stinner in branch 'default':
Issue #21118: Use _PyUnicodeWriter API in str.translate() to simplify and
http://hg.python.org/cpython/rev/03b1dd29b327
msg215552 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-04 19:05
> New changeset 95d4e8124c6a by Victor Stinner in branch 'default':
> Issue #21118: Fix _PyUnicodeTranslateError_Create(), add missing format
> http://hg.python.org/cpython/rev/95d4e8124c6a

It looks like _PyUnicode_TranslateCharmap() didn't work with error handler different than "strict", "replace", "ignore" and "xmlcharrefreplace": UnicodeTranslateError constructor crashed before my change.
msg215553 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-04-04 19:06
New changeset 70990e795657 by Victor Stinner in branch '3.4':
Issue #21118: Fix _PyUnicodeTranslateError_Create(), add missing format
http://hg.python.org/cpython/rev/70990e795657
msg215576 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-04 23:38
fast_translate.patch: here is the real patch which optimize translating ASCII to ASCII. It's *much* faster:

---------------------------+-------------+---------------
Tests                      |     writerA |          fastA
---------------------------+-------------+---------------
replace none, length=10    |  710 ns (*) |  169 ns (-76%)
replace none, length=10**3 | 59.1 us (*) |  997 ns (-98%)
replace none, length=10**6 | 59.2 ms (*) |  805 us (-99%)
replace 10%, length=10     |  678 ns (*) |  187 ns (-72%)
replace 10%, length=10**3  | 58.7 us (*) | 1.02 us (-98%)
replace 10%, length=10**6  | 57.5 ms (*) |  817 us (-99%)
replace 50%, length=10     |  559 ns (*) |  188 ns (-66%)
replace 50%, length=10**3  | 43.6 us (*) | 1.02 us (-98%)
replace 50%, length=10**6  | 43.4 ms (*) |  811 us (-98%)
replace 90%, length=10     |  437 ns (*) |  187 ns (-57%)
replace 90%, length=10**3  | 32.4 us (*) | 1.02 us (-97%)
replace 90%, length=10**6  | 31.8 ms (*) |  805 us (-97%)
replace all, length=10     |  386 ns (*) |  121 ns (-69%)
replace all, length=10**3  | 27.3 us (*) |  955 ns (-96%)
replace all, length=10**6  | 27.2 ms (*) |  807 us (-97%)
---------------------------+-------------+---------------
Total                      |  219 ms (*) | 4.05 ms (-98%)
---------------------------+-------------+---------------

I'm not sure yet that the patch doesn't make non-ASCII cases slower.
msg215577 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-04 23:43
translate_script_ascii.py: the benchmark script used to test fast_translate.patch.
msg215586 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-04-05 09:42
New changeset 9acc8196a82c by Victor Stinner in branch 'default':
Issue #21118: Remove unused variable
http://hg.python.org/cpython/rev/9acc8196a82c

New changeset bd594dd71d46 by Victor Stinner in branch 'default':
Issue #21118: Add more unit tests on str.translate()
http://hg.python.org/cpython/rev/bd594dd71d46
msg215587 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-04-05 09:56
New changeset cca6b056236a by Victor Stinner in branch 'default':
Issue #21118: Optimize str.translate() for ASCII => ASCII translation
http://hg.python.org/cpython/rev/cca6b056236a

New changeset 6a347c0ffbfc by Victor Stinner in branch 'default':
Issue #21118: Add unit test for invalid character replacement (code point higher than U+10ffff)
http://hg.python.org/cpython/rev/6a347c0ffbfc
msg215594 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-04-05 11:47
fast_translate.patch works only with ASCII input string and ASCII 1:1 mapping. Is this actually typical case?

Here is a patch which uses different approach. It caches values for ASCII keys. It works with all types of input strings and mappings and can speed up more use cases, including non-ASCII data, deletion and enlarging.

translate_timing.py results:

                                unpatched           patched
Testing 1-1 translation
str.translate                   4.55125927699919    0.7898181750006188
str.translate from bytes trans  1.8910855210015143  0.779950579000797
Testing deletion
str.translate                   4.481863372000589   0.7718261509999138
Testing enlarging translations
str.translate                   4.421521270000085   0.9290620680003485

translate_script_ascii.py results:

---------------------------+---------------------------+-------------------------------
Tests                      | translate_script_ascii.34 | translate_script_ascii.cached3
---------------------------+---------------------------+-------------------------------
replace none, length=10    |           6.12 us (+176%) |                    2.22 us (*)
replace none, length=10**3 |           448 us (+1293%) |                    32.2 us (*)
replace none, length=10**6 |           474 ms (+1435%) |                    30.9 ms (*)
replace 10%, length=10     |           5.73 us (+133%) |                    2.46 us (*)
replace 10%, length=10**3  |           412 us (+1060%) |                    35.5 us (*)
replace 10%, length=10**6  |           442 ms (+1204%) |                    33.9 ms (*)
replace 50%, length=10     |            4.75 us (+85%) |                    2.57 us (*)
replace 50%, length=10**3  |            311 us (+552%) |                    47.7 us (*)
replace 50%, length=10**6  |            331 ms (+617%) |                    46.2 ms (*)
replace 90%, length=10     |            3.36 us (+29%) |                    2.59 us (*)
replace 90%, length=10**3  |            178 us (+250%) |                    50.8 us (*)
replace 90%, length=10**6  |            192 ms (+291%) |                    49.2 ms (*)
replace all, length=10     |            2.64 us (+28%) |                    2.06 us (*)
replace all, length=10**3  |            146 us (+189%) |                    50.3 us (*)
replace all, length=10**6  |            152 ms (+194%) |                    51.7 ms (*)
---------------------------+---------------------------+-------------------------------
Total                      |          1.59 sec (+651%) |                     212 ms (*)
---------------------------+---------------------------+-------------------------------
msg215598 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-04-05 12:27
New changeset 47b0c076e17d by Victor Stinner in branch 'default':
Issue #21118: Optimize also str.translate() for ASCII => ASCII deletion
http://hg.python.org/cpython/rev/47b0c076e17d
msg215599 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-05 12:49
Serhiy wrote:
> fast_translate.patch works only with ASCII input string and ASCII 1:1 mapping. Is this actually typical case?

I just checked the Python stdlib: as expected, all usages of
str.translate() except of email.quoprimime use ASCII 1:1. My
optimization is only used if the input string is ASCII, but I expect
that most strings are just ASCI.

** distutils: ASCII => ASCII (1:1)

   longopt_xlate = str.maketrans('-', '_')
and
   WS_TRANS = {ord(_wschar) : ' ' for _wschar in string.whitespace};
.. text = text.translate(WS_TRANS)

** email.quoprimes:

    encoded = header_bytes.decode('latin1').translate(_QUOPRI_HEADER_MAP)
and
    body = body.translate(_QUOPRI_BODY_ENCODE_MAP)

=> my optimization is used if the input string contains "safe header
characters" (a-z, A-Z, 0-9, space and "-!*+/"). It should be the
common case for emails.

** rot13 encoding: ASCII 1:1

** idlelib.PyParse: ASCII 1:1

        str = str.translate(_tran)

** textwrap: ASCII 1:1

            text = text.translate(self.unicode_whitespace_trans)

** zipfile: ASCII 1:1

        arcname = arcname.translate(table)
msg215600 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-04-05 12:51
Previous patch and results were against source code before committing fast_translate.patch. Here is a patch synchronized with current code.

translate_timing.py results:

                                unpatched           patched
Testing 1-1 translation
str.translate                   0.5265426559999469  0.6120695240006171
str.translate from bytes trans  0.2608634099997289  0.32327288200031035
Testing deletion
str.translate                   4.331346814999051   0.7960810519998631
Testing enlarging translations
str.translate                   4.392650978999882   0.9280614529998275

translate_script_ascii.py results:

---------------------------+------------------------------+-------------------------------
Tests                      | translate_script_ascii.fastA | translate_script_ascii.cached2
---------------------------+------------------------------+-------------------------------
replace none, length=10    |                  1.54 us (*) |                 2.07 us (+34%)
replace none, length=10**3 |                  10.5 us (*) |                        10.6 us
replace none, length=10**6 |                      12.6 ms |                    12.3 ms (*)
replace 10%, length=10     |                  1.69 us (*) |                 2.31 us (+37%)
replace 10%, length=10**3  |                  10.6 us (*) |                        10.7 us
replace 10%, length=10**6  |                      12.6 ms |                    12.1 ms (*)
replace 50%, length=10     |                  1.69 us (*) |                 2.31 us (+36%)
replace 50%, length=10**3  |                  10.6 us (*) |                        10.9 us
replace 50%, length=10**6  |                      12.6 ms |                    12.3 ms (*)
replace 90%, length=10     |                  1.69 us (*) |                 2.26 us (+34%)
replace 90%, length=10**3  |                  10.6 us (*) |                        10.7 us
replace 90%, length=10**6  |                      12.6 ms |                    12.2 ms (*)
replace all, length=10     |                  1.07 us (*) |                 1.68 us (+56%)
replace all, length=10**3  |                  9.28 us (*) |                        9.46 us
replace all, length=10**6  |                      12.6 ms |                      12 ms (*)
---------------------------+------------------------------+-------------------------------
Total                      |                        63 ms |                    60.9 ms (*)
---------------------------+------------------------------+-------------------------------

translate_cached_2.patch little slows down translation of short ASCII strings (due to cache initialization and freeing), but speeds up deletion and enlarging and works with non-ASCII data.
msg215602 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-05 13:03
bench_translate.py: benchmark ASCII 1:1 but also ASCII 1:1 with deletion. Results of the benchmark comparing tip (47b0c076e17d which includes my latest optimization on deletion) and 6a347c0ffbfc + translate_cached_2.patch.

Common platform:
Python unicode implementation: PEP 393
CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
Platform: Linux-3.12.8-300.fc20.x86_64-x86_64-with-fedora-20-Heisenbug
Bits: int=32, long=64, long long=64, size_t=64, void*=64
CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Timer: time.perf_counter
Timer precision: 45 ns
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)

Platform of campaign remove:
SCM: hg revision=47b0c076e17d tag=tip branch=default date="2014-04-05 14:27 +0200"
Python version: 3.5.0a0 (default:47b0c076e17d, Apr 5 2014, 14:50:53) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]
Date: 2014-04-05 14:51:55

Platform of campaign cache:
SCM: hg revision=6a347c0ffbfc+ branch=default date="2014-04-05 11:56 +0200"
Python version: 3.5.0a0 (default:6a347c0ffbfc+, Apr 5 2014, 14:53:02) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]
Date: 2014-04-05 14:53:12

---------------------------+-------------+----------------
Tests                      |      remove |           cache
---------------------------+-------------+----------------
replace none, length=10    |  184 ns (*) |   275 ns (+50%)
replace none, length=10**3 | 1.06 us (*) |          1.1 us
replace none, length=10**6 |  827 us (*) |          792 us
replace 10%, length=10     |  207 ns (*) |   298 ns (+44%)
replace 10%, length=10**3  | 1.08 us (*) |         1.12 us
replace 10%, length=10**6  |  828 us (*) |          793 us
replace 50%, length=10     |  205 ns (*) |   298 ns (+46%)
replace 50%, length=10**3  | 1.08 us (*) |   1.17 us (+7%)
replace 50%, length=10**6  |  827 us (*) |          793 us
replace 90%, length=10     |  208 ns (*) |   298 ns (+44%)
replace 90%, length=10**3  | 1.09 us (*) |         1.13 us
replace 90%, length=10**6  |  850 us (*) |    793 us (-7%)
replace all, length=10     |  145 ns (*) |   226 ns (+56%)
replace all, length=10**3  | 1.03 us (*) |         1.04 us
replace all, length=10**6  |  827 us (*) |          792 us
remove none, length=10     |  184 ns (*) |   274 ns (+49%)
remove none, length=10**3  | 1.07 us (*) |         1.09 us
remove none, length=10**6  |  836 us (*) |    793 us (-5%)
remove 10%, length=10      |  223 ns (*) |   408 ns (+83%)
remove 10%, length=10**3   | 1.45 us (*) | 9.13 us (+531%)
remove 10%, length=10**6   | 1.08 ms (*) | 8.73 ms (+706%)
remove 50%, length=10      |  221 ns (*) |   407 ns (+84%)
remove 50%, length=10**3   | 1.23 us (*) | 8.28 us (+575%)
remove 50%, length=10**6   |  948 us (*) |  7.9 ms (+734%)
remove 90%, length=10      |  230 ns (*) |   375 ns (+63%)
remove 90%, length=10**3   | 1.57 us (*) | 3.86 us (+145%)
remove 90%, length=10**6   | 1.28 ms (*) | 3.49 ms (+173%)
remove all, length=10      |  139 ns (*) |   266 ns (+92%)
remove all, length=10**3   | 1.24 us (*) |  2.46 us (+99%)
remove all, length=10**6   | 1.07 ms (*) | 2.13 ms (+100%)
---------------------------+-------------+----------------
Total                      | 9.38 ms (*) |   27 ms (+188%)
---------------------------+-------------+----------------

You patch is always slower for the common case (ASCII => ASCII translation).

I implemented the most obvious optimization for the most common case (ASCII 1:1 and ASCII 1:1 with deletion). I consider that the current code is enough to close this issue.

@Josh Rosenberg: Thanks for the report. The current implementation should be almost as fast as bytes.translate() (the "60x" factor you mentionned in the title) for ASCII 1:1 mapping.

--

Serhiy: If you are interested to optimize str.translate() for the general case (larger charset), please open a new issue. It will probably require more complex "cache". You may take a look at charmap codec which has such more complex cache (cache with 3 levels), see my message msg215301.

IMO it's not interesting to invest time on optimizing str.translate(), it's not a common function. It took some years before an user run a benchmark on it :-)
msg215606 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-04-05 13:35
New changeset e56c71c6b45e by Victor Stinner in branch 'default':
Issue #21118: str.translate() now raises a ValueError, not a TypeError, if the
http://hg.python.org/cpython/rev/e56c71c6b45e
msg215610 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-04-05 13:57
субота, 05-кві-2014 12:49:22 ви написали:
> STINNER Victor added the comment:
> 
> Serhiy wrote:
> > fast_translate.patch works only with ASCII input string and ASCII 1:1
> > mapping. Is this actually typical case?
> I just checked the Python stdlib: as expected, all usages of
> str.translate() except of email.quoprimime use ASCII 1:1.

Because str.translate() is much slower than a series of str.replace() (which 
already is optimized), some usages of str.translate() was rewritten to use 
str.replace(). See for example html.escape(). This is about what this issue.

> My
> optimization is only used if the input string is ASCII, but I expect
> that most strings are just ASCI.

In most (if not all) these cases input string can be non-ASCII.

> bench_translate.py: benchmark ASCII 1:1 but also ASCII 1:1 with deletion.

Could you please provide bench_translate.py?

> It will probably require more complex "cache". You may take a look at
> charmap codec which has such more complex cache (cache with 3 levels), see
> my message msg215301.

I were going to do this on next step. Full cache can grow up to 1114112 
characters, so I planned to cache only BMP characters (cache with 2 levels).

You commit too fast, I am late for you. ;)
msg215612 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-04-05 14:00
Do you plan to backport bug fixes to 3.3 and 2.7?
msg215673 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2014-04-07 00:39
Thanks for addressing this so fast. Annoyingly, I suspect it will not help the original case that led me to finding the slowdown (I had some code that was translating from 56 latin-1 Romance characters with diacritics to the equivalent ASCII characters, so it was 1-1, but it was always mapping from non-ASCII to ASCII, and therefore won't benefit from a change that only caches code points 0-127).

That said, every *other* time I've used str.translate has been in an ASCII->ASCII scenario, where this *will* help. So again, thank you.
msg215676 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-07 08:34
> Could you please provide bench_translate.py?

Oh, I forgot to add it :-(
msg215678 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-07 09:12
> Do you plan to backport bug fixes to 3.3 and 2.7?

Which commit?

3.3 is no more open to bug fixes, only to security fixes.

> New changeset 95d4e8124c6a by Victor Stinner in branch 'default':
> Issue #21118: Fix _PyUnicodeTranslateError_Create(), add missing format
> http://hg.python.org/cpython/rev/95d4e8124c6a

Python 2.7 is not affected by this issue.
msg215680 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-07 09:24
I opened the issue #21165 to discuss a more generic optimization of str.translate().
msg215685 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-04-07 09:51
> > Do you plan to backport bug fixes to 3.3 and 2.7?
> Which commit?

A fix of _PyUnicode_TranslateCharmap().

> 3.3 is no more open to bug fixes, only to security fixes.

Oh, I meant 3.4.
msg215686 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-07 09:56
> A fix of _PyUnicode_TranslateCharmap().

Sorry, I don't see which commit is a bugfix and is not backported to Python 3.4 yet.
msg215742 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-04-08 07:14
New changeset cb632988bc09 by Victor Stinner in branch 'default':
Issue #21118: PyLong_AS_LONG() result type is long
http://hg.python.org/cpython/rev/cb632988bc09
History
Date User Action Args
2014-04-08 07:14:31python-devsetmessages: + msg215742
2014-04-07 09:56:38vstinnersetmessages: + msg215686
2014-04-07 09:51:52serhiy.storchakasetmessages: + msg215685
2014-04-07 09:24:13vstinnersetmessages: + msg215680
2014-04-07 09:12:51vstinnersetmessages: + msg215678
2014-04-07 08:34:41vstinnersetfiles: + bench_translate.py

messages: + msg215676
2014-04-07 00:39:13josh.rsetmessages: + msg215673
2014-04-05 14:00:00serhiy.storchakasetmessages: + msg215612
2014-04-05 13:57:16serhiy.storchakasetmessages: + msg215610
2014-04-05 13:35:18python-devsetmessages: + msg215606
2014-04-05 13:03:50vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg215602
2014-04-05 12:51:09serhiy.storchakasetfiles: + translate_cached_2.patch

messages: + msg215600
2014-04-05 12:49:22vstinnersetmessages: + msg215599
2014-04-05 12:27:29python-devsetmessages: + msg215598
2014-04-05 11:47:43serhiy.storchakasetfiles: + translate_cached.patch

messages: + msg215594
2014-04-05 09:56:51python-devsetmessages: + msg215587
2014-04-05 09:42:32python-devsetmessages: + msg215586
2014-04-04 23:43:34vstinnersetfiles: + translate_script_ascii.py

messages: + msg215577
2014-04-04 23:38:29vstinnersetfiles: + fast_translate.patch

messages: + msg215576
2014-04-04 19:06:00python-devsetmessages: + msg215553
2014-04-04 19:05:00vstinnersetmessages: + msg215552
2014-04-04 19:02:53python-devsetnosy: + python-dev
messages: + msg215551
2014-04-04 17:39:32vstinnersetmessages: + msg215541
2014-04-04 17:35:53serhiy.storchakasetmessages: + msg215540
2014-04-04 17:30:07vstinnersetfiles: + translate_writer_bench.txt

messages: + msg215539
2014-04-04 17:29:34vstinnersetfiles: + translate_script.py

messages: + msg215538
2014-04-03 21:50:40josh.rsetmessages: + msg215473
2014-04-02 15:00:33pitrousetnosy: + pitrou
messages: + msg215387
2014-04-01 21:20:24josh.rsetmessages: + msg215339
2014-04-01 08:55:29serhiy.storchakasetstage: patch review
2014-04-01 08:23:37vstinnersetfiles: + translate_writer.patch
keywords: + patch
messages: + msg215301
2014-03-31 23:32:32josh.rsettitle: str.translate is absurdly slow in majority of use cases (takes 3x to 10x longer than similar functions) -> str.translate is absurdly slow in majority of use cases (takes up to 60x longer than similar functions)
2014-03-31 23:09:47vstinnersetnosy: + serhiy.storchaka
2014-03-31 23:09:06josh.rcreate