str.translate is absurdly slow in majority of use cases (takes up to 60x longer than similar functions) #65317

MojoVampire · 2014-03-31T23:09:06Z

BPO	21118
Nosy	@pitrou, @vstinner, @ezio-melotti, @serhiy-storchaka, @MojoVampire
Files	translate_timing.py: File to time various ways of using str.translate and its "competing" functions translate_writer.patch translate_script.py translate_writer_bench.txt fast_translate.patch translate_script_ascii.py translate_cached.patch translate_cached_2.patch bench_translate.py

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2014-04-05.13:03:50.934>
created_at = <Date 2014-03-31.23:09:06.297>
labels = ['expert-unicode', 'performance']
title = 'str.translate is absurdly slow in majority of use cases (takes up to 60x longer than similar functions)'
updated_at = <Date 2014-04-08.07:14:31.298>
user = 'https://github.com/MojoVampire'

bugs.python.org fields:

activity = <Date 2014-04-08.07:14:31.298>
actor = 'python-dev'
assignee = 'none'
closed = True
closed_date = <Date 2014-04-05.13:03:50.934>
closer = 'vstinner'
components = ['Unicode']
creation = <Date 2014-03-31.23:09:06.297>
creator = 'josh.r'
dependencies = []
files = ['34688', '34691', '34727', '34728', '34731', '34732', '34734', '34736', '34744']
hgrepos = []
issue_num = 21118
keywords = ['patch']
message_count = 31.0
messages = ['215283', '215301', '215339', '215387', '215473', '215538', '215539', '215540', '215541', '215551', '215552', '215553', '215576', '215577', '215586', '215587', '215594', '215598', '215599', '215600', '215602', '215606', '215610', '215612', '215673', '215676', '215678', '215680', '215685', '215686', '215742']
nosy_count = 6.0
nosy_names = ['pitrou', 'vstinner', 'ezio.melotti', 'python-dev', 'serhiy.storchaka', 'josh.r']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'patch review'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue21118'
versions = ['Python 3.4', 'Python 3.5']

MojoVampire · 2014-03-31T23:09:05Z

String translation functions, at least in other languages, are typically a performance optimization when you *really* need the speed. In exchange for paying the memory cost and one-time work to create a translation table, you get much faster 1 to 1 and deletion of characters.

In Python 3, str.translate is usually a performance pessimization, not optimization. While it does a few things similar functions (e.g. str.replace) can't do, like perform a translation where the inputs and outputs overlap ('abc' -> 'bca' wouldn't work with str.replace), and it handles the full Unicode range ('\u0f01\u0f02' -> '\u0f03\u0f04' can't be done by encoding, using bytes.translate, then decoding again), when another approach is available, str.translate is almost always the slowest.

While it may not be practical to make the general case for str.translate match the performance of bytes.translate, it should at least be possible to improve upon it in many common cases.

My timing data (see attached file for how it was produced):

    $ python.exe translate_timing.py

Testing 1-1 translation
bytes.translate 0.05638575777521804
str.translate 2.9639258423152874
str.translate from bytes trans 1.6307581351818357
str.encode->bytes.translate->bytes.decode 0.09472254504689914
str.replace 0.2507745649580393

Testing deletion
bytes.translate 0.05367317344084199
str.translate 2.360142494240577
str.encode->bytes.translate->bytes.decode 0.0629345277274993
str.replace 0.31173443413690904

Testing enlarging translations
str.translate 2.33631417640283
str.replace 0.41248187325160757

Obviously, this is somewhat contrived, but it illustrates my point that the common case (ASCII or latin-1 strings) are far too slow. In every case, str.translate loses to every competitor, including itself when you pass it a translation table produced by bytes.translate, and including the case where a pure Python function calls str.replace over and over.

A large part of the problem is clearly the use of a dict to perform lookups; as seen in the 1-1 test case, str.translate using the result of str.maketrans takes nearly twice as long as when it uses the result of a similar bytes.maketrans. While I have not profiled the code, I suspect the remainder is a combination of the overhead involving conversion of Py_UCS4 to PyLong for general purpose lookup, handling the multiple ways the translation target can be expressed (a Unicode ordinal or a str object), and the code that checks for and resizes the output buffer each time.

I have ideas for a solution involving a special purpose translation object to be produced by str.maketrans for cases where the translation table will produce an equal or smaller string (all to values are None or length 1 or lower strings), and the from values all fall in a small enough range (say, 1024) to avoid excessive memory usage from large translation table objects. But I'd like to know whether such a change (including modifying the behavior of str.maketrans to produce an unspecified mapping object, instead of the current dict) is desired, acceptable, what have you.

vstinner · 2014-04-01T08:23:35Z

str.translate() currently allocates a buffer of UCS4 characters.

translate_writer.patch:

modify _PyUnicode_TranslateCharmap() to use the _PyUnicodeWriter API
drop optimizations for error handlers different than "ignore" because there is no unit tests for them, and str.translate() uses "ignore". It's safer to drop untested optimization.
cleanup also the code: charmaptranslate_output() is now responsible to handle charmaptranslate_lookup() result (to decrement the reference coutner)

str.translate() may be a little bit faster when translating ASCII to ASCII for large string, but not so much.

bytes.translate() is much faster because it builds a C array of 256 items to fast table lookup, whereas str.translate() requires a Python dict lookup for each character, which is much slower.

codecs.charmap_build() (PyUnicode_BuildEncodingMap()) creates a C array ("a three-level trie") for fast lookup. It is used with codecs.charmap_encode() for 8-bit encodings. We may reuse it for simple cases, like translating ASCII to ASCII.

MojoVampire · 2014-04-01T21:20:24Z

@Haypo: Are you planning to run with this then? Or shall I pick up where your patch leaves off?

Thanks for the pointer to the codecs charmap_build; it's not documented anywhere that I can tell, so I didn't even know it existed. Now to dig into the source to figure out what it does and how to use it (both in C and Python).

I know about the fundamental design differences that makes bytes.translate kick str.translate's butt; my plan to make a special purpose mapping-like object that didn't require C layer code to wrap up Py_UCS4 in PyLong, or pull Py_UCS4 out of PyUnicode/PyLong objects was largely to try and get a bytes.translate like implementation, but if
codecs.charmap_build()/PyUnicode_BuildEncodingMap() can get me 90% of the way there without reinventing the wheel, it's clearly worth looking at.

Again, let me know if it's okay if I try to finish off the work you began, or if you'd prefer to do it yourself.

pitrou · 2014-04-02T15:00:33Z

bytes.translate() is much faster because it builds a C array of 256
items to fast table lookup, whereas str.translate() requires a Python
dict lookup for each character, which is much slower.

It should be easy to devise a simple hash table for the common case of one-to-one translation (and also deletion).

MojoVampire · 2014-04-03T21:50:40Z

Hmm... Maybe I'm testing it wrong, but I'm finding your patch is slowing down translation by a small amount on my test case; not a lot, maybe a 10% slowdown on enlarging translations, 6% for 1-1, and deletion in between.

vstinner · 2014-04-04T17:29:33Z

translate_script.py: microbenchmark for str.translate() with various length, charsets (ASCII, latin1, UCS2, UCS4) and percentage of replacement. Result:

---------------------------+----------+---------
Summary                    | original |   writer
---------------------------+------+---------
replace none, length=10    | 5.65 us | 5.67 us
replace none, length=10**3 |   489 us |   486 us
replace none, length=10**6 |   487 ms |   484 ms
replace 10%, length=10     | 5.47 us | 5.48 us
replace 10%, length=10**3 |   455 us |   455 us
replace 10%, length=10**6 |   455 ms |   455 ms
replace 50%, length=10     | 4.71 us |   4.7 us
replace 50%, length=10**3 |   372 us |   374 us
replace 50%, length=10**6 |   371 ms |   373 ms
replace 90%, length=10     | 3.88 us | 3.93 us
replace 90%, length=10**3 |   293 us |   297 us
replace 90%, length=10**6 |   293 ms |   295 ms
replace all, length=10     | 3.58 us | 3.59 us
replace all, length=10**3 |   263 us |   262 us
replace all, length=10**6 |   262 ms |   261 ms
---------------------------+----------+---------
Total                      | 1.87 sec | 1.87 sec
---------------------------+----------+---------

It's exactly the same with translate_writer.patch.

vstinner · 2014-04-04T17:30:05Z

translate_writer_bench.txt: full output of the micro-benchmark.

serhiy-storchaka · 2014-04-04T17:35:53Z

drop optimizations for error handlers different than "ignore" because there is no unit tests for them, and str.translate() uses "ignore". It's safer to drop untested optimization.

It is unsafe to do such nontrivial modifications for untested code.

vstinner · 2014-04-04T17:39:32Z

Oh, I tested the wrong patch :-( Here is the updated benchmark summary:

---------------------------+--------------+---------------
Summary | original | writer
---------------------------+--------------+---------------
replace none, length=10 | 5.65 us () | 5.53 us
replace none, length=10**3 | 489 us () | 475 us
replace none, length=10**6 | 487 ms () | 476 ms
replace 10%, length=10 | 5.47 us () | 5.25 us
replace 10%, length=10**3 | 455 us () | 447 us
replace 10%, length=10**6 | 455 ms () | 447 ms
replace 50%, length=10 | 4.71 us () | 4.42 us (-6%)
replace 50%, length=10**3 | 372 us () | 352 us (-5%)
replace 50%, length=10**6 | 371 ms () | 352 ms (-5%)
replace 90%, length=10 | 3.88 us () | 3.53 us (-9%)
replace 90%, length=10**3 | 293 us () | 270 us (-8%)
replace 90%, length=10**6 | 293 ms () | 271 ms (-7%)
replace all, length=10 | 3.58 us () | 3.24 us (-10%)
replace all, length=10**3 | 263 us () | 236 us (-10%)
replace all, length=10**6 | 262 ms () | 235 ms (-10%)
---------------------------+--------------+---------------
Total | 1.87 sec () | 1.78 sec
---------------------------+--------------+---------------

str.translate() is a little bit faster with translate_writer.patch, but I don't really care if it's not faster, it's just to cleanup to code.

python-dev · 2014-04-04T19:02:53Z

New changeset 95d4e8124c6a by Victor Stinner in branch 'default':
Issue bpo-21118: Fix _PyUnicodeTranslateError_Create(), add missing format
http://hg.python.org/cpython/rev/95d4e8124c6a

New changeset 03b1dd29b327 by Victor Stinner in branch 'default':
Issue bpo-21118: Use _PyUnicodeWriter API in str.translate() to simplify and
http://hg.python.org/cpython/rev/03b1dd29b327

vstinner · 2014-04-04T19:05:01Z

New changeset 95d4e8124c6a by Victor Stinner in branch 'default':
Issue bpo-21118: Fix _PyUnicodeTranslateError_Create(), add missing format
http://hg.python.org/cpython/rev/95d4e8124c6a

It looks like _PyUnicode_TranslateCharmap() didn't work with error handler different than "strict", "replace", "ignore" and "xmlcharrefreplace": UnicodeTranslateError constructor crashed before my change.

python-dev · 2014-04-04T19:06:01Z

New changeset 70990e795657 by Victor Stinner in branch '3.4':
Issue bpo-21118: Fix _PyUnicodeTranslateError_Create(), add missing format
http://hg.python.org/cpython/rev/70990e795657

vstinner · 2014-04-04T23:38:28Z

fast_translate.patch: here is the real patch which optimize translating ASCII to ASCII. It's *much* faster:

---------------------------+-------------+---------------
Tests                      |     writerA |          fastA
---------------------------+-------------+---------------
replace none, length=10    | 710 ns () | 169 ns (-76%)
replace none, length=10**3 | 59.1 us () | 997 ns (-98%)
replace none, length=10**6 | 59.2 ms () | 805 us (-99%)
replace 10%, length=10     | 678 ns () | 187 ns (-72%)
replace 10%, length=10**3 | 58.7 us () | 1.02 us (-98%)
replace 10%, length=10**6 | 57.5 ms () | 817 us (-99%)
replace 50%, length=10     | 559 ns () | 188 ns (-66%)
replace 50%, length=10**3 | 43.6 us () | 1.02 us (-98%)
replace 50%, length=10**6 | 43.4 ms () | 811 us (-98%)
replace 90%, length=10     | 437 ns () | 187 ns (-57%)
replace 90%, length=10**3 | 32.4 us () | 1.02 us (-97%)
replace 90%, length=10**6 | 31.8 ms () | 805 us (-97%)
replace all, length=10     | 386 ns () | 121 ns (-69%)
replace all, length=10**3 | 27.3 us () | 955 ns (-96%)
replace all, length=10**6 | 27.2 ms () | 807 us (-97%)
---------------------------+-------------+---------------
Total                      | 219 ms () | 4.05 ms (-98%)
---------------------------+-------------+---------------

I'm not sure yet that the patch doesn't make non-ASCII cases slower.

vstinner · 2014-04-04T23:43:35Z

translate_script_ascii.py: the benchmark script used to test fast_translate.patch.

python-dev · 2014-04-05T09:42:32Z

New changeset 9acc8196a82c by Victor Stinner in branch 'default':
Issue bpo-21118: Remove unused variable
http://hg.python.org/cpython/rev/9acc8196a82c

New changeset bd594dd71d46 by Victor Stinner in branch 'default':
Issue bpo-21118: Add more unit tests on str.translate()
http://hg.python.org/cpython/rev/bd594dd71d46

python-dev · 2014-04-05T09:56:51Z

New changeset cca6b056236a by Victor Stinner in branch 'default':
Issue bpo-21118: Optimize str.translate() for ASCII => ASCII translation
http://hg.python.org/cpython/rev/cca6b056236a

New changeset 6a347c0ffbfc by Victor Stinner in branch 'default':
Issue bpo-21118: Add unit test for invalid character replacement (code point higher than U+10ffff)
http://hg.python.org/cpython/rev/6a347c0ffbfc

serhiy-storchaka · 2014-04-05T11:47:41Z

fast_translate.patch works only with ASCII input string and ASCII 1:1 mapping. Is this actually typical case?

Here is a patch which uses different approach. It caches values for ASCII keys. It works with all types of input strings and mappings and can speed up more use cases, including non-ASCII data, deletion and enlarging.

translate_timing.py results:

                            unpatched           patched

Testing 1-1 translation
str.translate 4.55125927699919 0.7898181750006188
str.translate from bytes trans 1.8910855210015143 0.779950579000797
Testing deletion
str.translate 4.481863372000589 0.7718261509999138
Testing enlarging translations
str.translate 4.421521270000085 0.9290620680003485

translate_script_ascii.py results:

---------------------------+---------------------------+-------------------------------
Tests                      | translate_script_ascii.34 | translate_script_ascii.cached3
---------------------------+---------------------------+-------------------------------
replace none, length=10    |           6.12 us (+176%) |                    2.22 us ()
replace none, length=10**3 |           448 us (+1293%) |                    32.2 us ()
replace none, length=10**6 |           474 ms (+1435%) |                    30.9 ms ()
replace 10%, length=10     |           5.73 us (+133%) |                    2.46 us ()
replace 10%, length=10**3 |           412 us (+1060%) |                    35.5 us ()
replace 10%, length=10**6 |           442 ms (+1204%) |                    33.9 ms ()
replace 50%, length=10     |            4.75 us (+85%) |                    2.57 us ()
replace 50%, length=10**3 |            311 us (+552%) |                    47.7 us ()
replace 50%, length=10**6 |            331 ms (+617%) |                    46.2 ms ()
replace 90%, length=10     |            3.36 us (+29%) |                    2.59 us ()
replace 90%, length=10**3 |            178 us (+250%) |                    50.8 us ()
replace 90%, length=10**6 |            192 ms (+291%) |                    49.2 ms ()
replace all, length=10     |            2.64 us (+28%) |                    2.06 us ()
replace all, length=10**3 |            146 us (+189%) |                    50.3 us ()
replace all, length=10**6 |            152 ms (+194%) |                    51.7 ms ()
---------------------------+---------------------------+-------------------------------
Total                      |          1.59 sec (+651%) |                     212 ms ()
---------------------------+---------------------------+-------------------------------

python-dev · 2014-04-05T12:27:29Z

New changeset 47b0c076e17d by Victor Stinner in branch 'default':
Issue bpo-21118: Optimize also str.translate() for ASCII => ASCII deletion
http://hg.python.org/cpython/rev/47b0c076e17d

vstinner · 2014-04-05T12:49:22Z

Serhiy wrote:

fast_translate.patch works only with ASCII input string and ASCII 1:1 mapping. Is this actually typical case?

I just checked the Python stdlib: as expected, all usages of
str.translate() except of email.quoprimime use ASCII 1:1. My
optimization is only used if the input string is ASCII, but I expect
that most strings are just ASCI.

** distutils: ASCII => ASCII (1:1)

   longopt_xlate = str.maketrans('-', '_')
and
   WS_TRANS = {ord(_wschar) : ' ' for _wschar in string.whitespace};
.. text = text.translate(WS_TRANS)

** email.quoprimes:

    encoded = header_bytes.decode('latin1').translate(_QUOPRI_HEADER_MAP)
and
    body = body.translate(_QUOPRI_BODY_ENCODE_MAP)

=> my optimization is used if the input string contains "safe header
characters" (a-z, A-Z, 0-9, space and "-!*+/"). It should be the
common case for emails.

** rot13 encoding: ASCII 1:1

** idlelib.PyParse: ASCII 1:1

        str = str.translate(_tran)

** textwrap: ASCII 1:1

            text = text.translate(self.unicode_whitespace_trans)

** zipfile: ASCII 1:1

        arcname = arcname.translate(table)

serhiy-storchaka · 2014-04-05T12:51:08Z

Previous patch and results were against source code before committing fast_translate.patch. Here is a patch synchronized with current code.

translate_timing.py results:

                            unpatched           patched

Testing 1-1 translation
str.translate 0.5265426559999469 0.6120695240006171
str.translate from bytes trans 0.2608634099997289 0.32327288200031035
Testing deletion
str.translate 4.331346814999051 0.7960810519998631
Testing enlarging translations
str.translate 4.392650978999882 0.9280614529998275

translate_script_ascii.py results:

---------------------------+------------------------------+-------------------------------
Tests                      | translate_script_ascii.fastA | translate_script_ascii.cached2
---------------------------+------------------------------+-------------------------------
replace none, length=10    |                  1.54 us () |                 2.07 us (+34%)
replace none, length=10**3 |                  10.5 us () |                        10.6 us
replace none, length=10**6 |                      12.6 ms |                    12.3 ms ()
replace 10%, length=10     |                  1.69 us () |                 2.31 us (+37%)
replace 10%, length=10**3 |                  10.6 us () |                        10.7 us
replace 10%, length=10**6 |                      12.6 ms |                    12.1 ms ()
replace 50%, length=10     |                  1.69 us () |                 2.31 us (+36%)
replace 50%, length=10**3 |                  10.6 us () |                        10.9 us
replace 50%, length=10**6 |                      12.6 ms |                    12.3 ms ()
replace 90%, length=10     |                  1.69 us () |                 2.26 us (+34%)
replace 90%, length=10**3 |                  10.6 us () |                        10.7 us
replace 90%, length=10**6 |                      12.6 ms |                    12.2 ms ()
replace all, length=10     |                  1.07 us () |                 1.68 us (+56%)
replace all, length=10**3 |                  9.28 us () |                        9.46 us
replace all, length=10**6 |                      12.6 ms |                      12 ms ()
---------------------------+------------------------------+-------------------------------
Total                      |                        63 ms |                    60.9 ms ()
---------------------------+------------------------------+-------------------------------

translate_cached_2.patch little slows down translation of short ASCII strings (due to cache initialization and freeing), but speeds up deletion and enlarging and works with non-ASCII data.

vstinner · 2014-04-05T13:03:49Z

bench_translate.py: benchmark ASCII 1:1 but also ASCII 1:1 with deletion. Results of the benchmark comparing tip (47b0c076e17d which includes my latest optimization on deletion) and 6a347c0ffbfc + translate_cached_2.patch.

Common platform:
Python unicode implementation: PEP-393
CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
Platform: Linux-3.12.8-300.fc20.x86_64-x86_64-with-fedora-20-Heisenbug
Bits: int=32, long=64, long long=64, size_t=64, void*=64
CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Timer: time.perf_counter
Timer precision: 45 ns
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)

Platform of campaign remove:
SCM: hg revision=47b0c076e17d tag=tip branch=default date="2014-04-05 14:27 +0200"
Python version: 3.5.0a0 (default:47b0c076e17d, Apr 5 2014, 14:50:53) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]
Date: 2014-04-05 14:51:55

Platform of campaign cache:
SCM: hg revision=6a347c0ffbfc+ branch=default date="2014-04-05 11:56 +0200"
Python version: 3.5.0a0 (default:6a347c0ffbfc+, Apr 5 2014, 14:53:02) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]
Date: 2014-04-05 14:53:12

---------------------------+-------------+----------------
Tests | remove | cache
---------------------------+-------------+----------------
replace none, length=10 | 184 ns () | 275 ns (+50%)
replace none, length=10**3 | 1.06 us () | 1.1 us
replace none, length=10**6 | 827 us () | 792 us
replace 10%, length=10 | 207 ns () | 298 ns (+44%)
replace 10%, length=10**3 | 1.08 us () | 1.12 us
replace 10%, length=10**6 | 828 us () | 793 us
replace 50%, length=10 | 205 ns () | 298 ns (+46%)
replace 50%, length=10**3 | 1.08 us () | 1.17 us (+7%)
replace 50%, length=10**6 | 827 us () | 793 us
replace 90%, length=10 | 208 ns () | 298 ns (+44%)
replace 90%, length=10**3 | 1.09 us () | 1.13 us
replace 90%, length=10**6 | 850 us () | 793 us (-7%)
replace all, length=10 | 145 ns () | 226 ns (+56%)
replace all, length=10**3 | 1.03 us () | 1.04 us
replace all, length=10**6 | 827 us () | 792 us
remove none, length=10 | 184 ns () | 274 ns (+49%)
remove none, length=10**3 | 1.07 us () | 1.09 us
remove none, length=10**6 | 836 us () | 793 us (-5%)
remove 10%, length=10 | 223 ns () | 408 ns (+83%)
remove 10%, length=10**3 | 1.45 us () | 9.13 us (+531%)
remove 10%, length=10**6 | 1.08 ms () | 8.73 ms (+706%)
remove 50%, length=10 | 221 ns () | 407 ns (+84%)
remove 50%, length=10**3 | 1.23 us () | 8.28 us (+575%)
remove 50%, length=10**6 | 948 us () | 7.9 ms (+734%)
remove 90%, length=10 | 230 ns () | 375 ns (+63%)
remove 90%, length=10**3 | 1.57 us () | 3.86 us (+145%)
remove 90%, length=10**6 | 1.28 ms () | 3.49 ms (+173%)
remove all, length=10 | 139 ns () | 266 ns (+92%)
remove all, length=10**3 | 1.24 us () | 2.46 us (+99%)
remove all, length=10**6 | 1.07 ms () | 2.13 ms (+100%)
---------------------------+-------------+----------------
Total | 9.38 ms (*) | 27 ms (+188%)
---------------------------+-------------+----------------

You patch is always slower for the common case (ASCII => ASCII translation).

I implemented the most obvious optimization for the most common case (ASCII 1:1 and ASCII 1:1 with deletion). I consider that the current code is enough to close this issue.

@josh Rosenberg: Thanks for the report. The current implementation should be almost as fast as bytes.translate() (the "60x" factor you mentionned in the title) for ASCII 1:1 mapping.

--

Serhiy: If you are interested to optimize str.translate() for the general case (larger charset), please open a new issue. It will probably require more complex "cache". You may take a look at charmap codec which has such more complex cache (cache with 3 levels), see my message msg215301.

IMO it's not interesting to invest time on optimizing str.translate(), it's not a common function. It took some years before an user run a benchmark on it :-)

python-dev · 2014-04-05T13:35:19Z

New changeset e56c71c6b45e by Victor Stinner in branch 'default':
Issue bpo-21118: str.translate() now raises a ValueError, not a TypeError, if the
http://hg.python.org/cpython/rev/e56c71c6b45e

serhiy-storchaka · 2014-04-05T13:57:16Z

субота, 05-кві-2014 12:49:22 ви написали:

STINNER Victor added the comment:

Serhiy wrote:
> fast_translate.patch works only with ASCII input string and ASCII 1:1
> mapping. Is this actually typical case?
I just checked the Python stdlib: as expected, all usages of
str.translate() except of email.quoprimime use ASCII 1:1.

Because str.translate() is much slower than a series of str.replace() (which
already is optimized), some usages of str.translate() was rewritten to use
str.replace(). See for example html.escape(). This is about what this issue.

My
optimization is only used if the input string is ASCII, but I expect
that most strings are just ASCI.

In most (if not all) these cases input string can be non-ASCII.

bench_translate.py: benchmark ASCII 1:1 but also ASCII 1:1 with deletion.

Could you please provide bench_translate.py?

It will probably require more complex "cache". You may take a look at
charmap codec which has such more complex cache (cache with 3 levels), see
my message msg215301.

I were going to do this on next step. Full cache can grow up to 1114112
characters, so I planned to cache only BMP characters (cache with 2 levels).

You commit too fast, I am late for you. ;)

serhiy-storchaka · 2014-04-05T14:00:00Z

Do you plan to backport bug fixes to 3.3 and 2.7?

MojoVampire · 2014-04-07T00:39:12Z

Thanks for addressing this so fast. Annoyingly, I suspect it will not help the original case that led me to finding the slowdown (I had some code that was translating from 56 latin-1 Romance characters with diacritics to the equivalent ASCII characters, so it was 1-1, but it was always mapping from non-ASCII to ASCII, and therefore won't benefit from a change that only caches code points 0-127).

That said, every *other* time I've used str.translate has been in an ASCII->ASCII scenario, where this *will* help. So again, thank you.

vstinner · 2014-04-07T08:34:40Z

Could you please provide bench_translate.py?

Oh, I forgot to add it :-(

vstinner · 2014-04-07T09:12:51Z

Do you plan to backport bug fixes to 3.3 and 2.7?

Which commit?

3.3 is no more open to bug fixes, only to security fixes.

New changeset 95d4e8124c6a by Victor Stinner in branch 'default':
Issue bpo-21118: Fix _PyUnicodeTranslateError_Create(), add missing format
http://hg.python.org/cpython/rev/95d4e8124c6a

Python 2.7 is not affected by this issue.

vstinner · 2014-04-07T09:24:13Z

I opened the issue bpo-21165 to discuss a more generic optimization of str.translate().

serhiy-storchaka · 2014-04-07T09:51:52Z

> Do you plan to backport bug fixes to 3.3 and 2.7?
Which commit?

A fix of _PyUnicode_TranslateCharmap().

3.3 is no more open to bug fixes, only to security fixes.

Oh, I meant 3.4.

vstinner · 2014-04-07T09:56:39Z

A fix of _PyUnicode_TranslateCharmap().

Sorry, I don't see which commit is a bugfix and is not backported to Python 3.4 yet.

python-dev · 2014-04-08T07:14:31Z

New changeset cb632988bc09 by Victor Stinner in branch 'default':
Issue bpo-21118: PyLong_AS_LONG() result type is long
http://hg.python.org/cpython/rev/cb632988bc09

MojoVampire mannequin added topic-unicode performance Performance or resource usage labels Mar 31, 2014

MojoVampire mannequin changed the title ~~str.translate is absurdly slow in majority of use cases (takes 3x to 10x longer than similar functions)~~ str.translate is absurdly slow in majority of use cases (takes up to 60x longer than similar functions) Mar 31, 2014

vstinner closed this as completed Apr 5, 2014

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

str.translate is absurdly slow in majority of use cases (takes up to 60x longer than similar functions) #65317

str.translate is absurdly slow in majority of use cases (takes up to 60x longer than similar functions) #65317

MojoVampire mannequin commented Mar 31, 2014

MojoVampire mannequin commented Mar 31, 2014

vstinner commented Apr 1, 2014

MojoVampire mannequin commented Apr 1, 2014

pitrou commented Apr 2, 2014

MojoVampire mannequin commented Apr 3, 2014

vstinner commented Apr 4, 2014

vstinner commented Apr 4, 2014

serhiy-storchaka commented Apr 4, 2014

vstinner commented Apr 4, 2014

python-dev mannequin commented Apr 4, 2014

vstinner commented Apr 4, 2014

python-dev mannequin commented Apr 4, 2014

vstinner commented Apr 4, 2014

vstinner commented Apr 4, 2014

python-dev mannequin commented Apr 5, 2014

python-dev mannequin commented Apr 5, 2014

serhiy-storchaka commented Apr 5, 2014

python-dev mannequin commented Apr 5, 2014

vstinner commented Apr 5, 2014

serhiy-storchaka commented Apr 5, 2014

vstinner commented Apr 5, 2014

python-dev mannequin commented Apr 5, 2014

serhiy-storchaka commented Apr 5, 2014

serhiy-storchaka commented Apr 5, 2014

MojoVampire mannequin commented Apr 7, 2014

vstinner commented Apr 7, 2014

vstinner commented Apr 7, 2014

vstinner commented Apr 7, 2014

serhiy-storchaka commented Apr 7, 2014

vstinner commented Apr 7, 2014

python-dev mannequin commented Apr 8, 2014

str.translate is absurdly slow in majority of use cases (takes up to 60x longer than similar functions) #65317

str.translate is absurdly slow in majority of use cases (takes up to 60x longer than similar functions) #65317

Comments

MojoVampire mannequin commented Mar 31, 2014

MojoVampire mannequin commented Mar 31, 2014

vstinner commented Apr 1, 2014

MojoVampire mannequin commented Apr 1, 2014

pitrou commented Apr 2, 2014

MojoVampire mannequin commented Apr 3, 2014

vstinner commented Apr 4, 2014

vstinner commented Apr 4, 2014

serhiy-storchaka commented Apr 4, 2014

vstinner commented Apr 4, 2014

python-dev mannequin commented Apr 4, 2014

vstinner commented Apr 4, 2014

python-dev mannequin commented Apr 4, 2014

vstinner commented Apr 4, 2014

vstinner commented Apr 4, 2014

python-dev mannequin commented Apr 5, 2014

python-dev mannequin commented Apr 5, 2014

serhiy-storchaka commented Apr 5, 2014

python-dev mannequin commented Apr 5, 2014

vstinner commented Apr 5, 2014

serhiy-storchaka commented Apr 5, 2014

vstinner commented Apr 5, 2014

python-dev mannequin commented Apr 5, 2014

serhiy-storchaka commented Apr 5, 2014

serhiy-storchaka commented Apr 5, 2014

MojoVampire mannequin commented Apr 7, 2014

vstinner commented Apr 7, 2014

vstinner commented Apr 7, 2014

vstinner commented Apr 7, 2014

serhiy-storchaka commented Apr 7, 2014

vstinner commented Apr 7, 2014

python-dev mannequin commented Apr 8, 2014