New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
str.translate is absurdly slow in majority of use cases (takes up to 60x longer than similar functions) #65317
Comments
String translation functions, at least in other languages, are typically a performance optimization when you *really* need the speed. In exchange for paying the memory cost and one-time work to create a translation table, you get much faster 1 to 1 and deletion of characters. In Python 3, str.translate is usually a performance pessimization, not optimization. While it does a few things similar functions (e.g. str.replace) can't do, like perform a translation where the inputs and outputs overlap ('abc' -> 'bca' wouldn't work with str.replace), and it handles the full Unicode range ('\u0f01\u0f02' -> '\u0f03\u0f04' can't be done by encoding, using bytes.translate, then decoding again), when another approach is available, str.translate is almost always the slowest. While it may not be practical to make the general case for str.translate match the performance of bytes.translate, it should at least be possible to improve upon it in many common cases. My timing data (see attached file for how it was produced): $ python.exe translate_timing.py
Obviously, this is somewhat contrived, but it illustrates my point that the common case (ASCII or latin-1 strings) are far too slow. In every case, str.translate loses to every competitor, including itself when you pass it a translation table produced by bytes.translate, and including the case where a pure Python function calls str.replace over and over. A large part of the problem is clearly the use of a dict to perform lookups; as seen in the 1-1 test case, str.translate using the result of str.maketrans takes nearly twice as long as when it uses the result of a similar bytes.maketrans. While I have not profiled the code, I suspect the remainder is a combination of the overhead involving conversion of Py_UCS4 to PyLong for general purpose lookup, handling the multiple ways the translation target can be expressed (a Unicode ordinal or a str object), and the code that checks for and resizes the output buffer each time. I have ideas for a solution involving a special purpose translation object to be produced by str.maketrans for cases where the translation table will produce an equal or smaller string (all to values are None or length 1 or lower strings), and the from values all fall in a small enough range (say, 1024) to avoid excessive memory usage from large translation table objects. But I'd like to know whether such a change (including modifying the behavior of str.maketrans to produce an unspecified mapping object, instead of the current dict) is desired, acceptable, what have you. |
str.translate() currently allocates a buffer of UCS4 characters. translate_writer.patch:
str.translate() may be a little bit faster when translating ASCII to ASCII for large string, but not so much. bytes.translate() is much faster because it builds a C array of 256 items to fast table lookup, whereas str.translate() requires a Python dict lookup for each character, which is much slower. codecs.charmap_build() (PyUnicode_BuildEncodingMap()) creates a C array ("a three-level trie") for fast lookup. It is used with codecs.charmap_encode() for 8-bit encodings. We may reuse it for simple cases, like translating ASCII to ASCII. |
@Haypo: Are you planning to run with this then? Or shall I pick up where your patch leaves off? Thanks for the pointer to the codecs charmap_build; it's not documented anywhere that I can tell, so I didn't even know it existed. Now to dig into the source to figure out what it does and how to use it (both in C and Python). I know about the fundamental design differences that makes bytes.translate kick str.translate's butt; my plan to make a special purpose mapping-like object that didn't require C layer code to wrap up Py_UCS4 in PyLong, or pull Py_UCS4 out of PyUnicode/PyLong objects was largely to try and get a bytes.translate like implementation, but if Again, let me know if it's okay if I try to finish off the work you began, or if you'd prefer to do it yourself. |
It should be easy to devise a simple hash table for the common case of one-to-one translation (and also deletion). |
Hmm... Maybe I'm testing it wrong, but I'm finding your patch is slowing down translation by a small amount on my test case; not a lot, maybe a 10% slowdown on enlarging translations, 6% for 1-1, and deletion in between. |
translate_script.py: microbenchmark for str.translate() with various length, charsets (ASCII, latin1, UCS2, UCS4) and percentage of replacement. Result: ---------------------------+----------+--------- It's exactly the same with translate_writer.patch. |
translate_writer_bench.txt: full output of the micro-benchmark. |
It is unsafe to do such nontrivial modifications for untested code. |
Oh, I tested the wrong patch :-( Here is the updated benchmark summary: ---------------------------+--------------+--------------- str.translate() is a little bit faster with translate_writer.patch, but I don't really care if it's not faster, it's just to cleanup to code. |
New changeset 95d4e8124c6a by Victor Stinner in branch 'default': New changeset 03b1dd29b327 by Victor Stinner in branch 'default': |
It looks like _PyUnicode_TranslateCharmap() didn't work with error handler different than "strict", "replace", "ignore" and "xmlcharrefreplace": UnicodeTranslateError constructor crashed before my change. |
New changeset 70990e795657 by Victor Stinner in branch '3.4': |
fast_translate.patch: here is the real patch which optimize translating ASCII to ASCII. It's *much* faster: ---------------------------+-------------+--------------- I'm not sure yet that the patch doesn't make non-ASCII cases slower. |
translate_script_ascii.py: the benchmark script used to test fast_translate.patch. |
New changeset 9acc8196a82c by Victor Stinner in branch 'default': New changeset bd594dd71d46 by Victor Stinner in branch 'default': |
New changeset cca6b056236a by Victor Stinner in branch 'default': New changeset 6a347c0ffbfc by Victor Stinner in branch 'default': |
fast_translate.patch works only with ASCII input string and ASCII 1:1 mapping. Is this actually typical case? Here is a patch which uses different approach. It caches values for ASCII keys. It works with all types of input strings and mappings and can speed up more use cases, including non-ASCII data, deletion and enlarging. translate_timing.py results:
Testing 1-1 translation translate_script_ascii.py results: ---------------------------+---------------------------+------------------------------- |
New changeset 47b0c076e17d by Victor Stinner in branch 'default': |
Serhiy wrote:
I just checked the Python stdlib: as expected, all usages of ** distutils: ASCII => ASCII (1:1) longopt_xlate = str.maketrans('-', '_')
and
WS_TRANS = {ord(_wschar) : ' ' for _wschar in string.whitespace};
.. text = text.translate(WS_TRANS) ** email.quoprimes: encoded = header_bytes.decode('latin1').translate(_QUOPRI_HEADER_MAP)
and
body = body.translate(_QUOPRI_BODY_ENCODE_MAP) => my optimization is used if the input string contains "safe header ** rot13 encoding: ASCII 1:1 ** idlelib.PyParse: ASCII 1:1 str = str.translate(_tran) ** textwrap: ASCII 1:1 text = text.translate(self.unicode_whitespace_trans) ** zipfile: ASCII 1:1 arcname = arcname.translate(table) |
Previous patch and results were against source code before committing fast_translate.patch. Here is a patch synchronized with current code. translate_timing.py results:
Testing 1-1 translation translate_script_ascii.py results: ---------------------------+------------------------------+------------------------------- translate_cached_2.patch little slows down translation of short ASCII strings (due to cache initialization and freeing), but speeds up deletion and enlarging and works with non-ASCII data. |
bench_translate.py: benchmark ASCII 1:1 but also ASCII 1:1 with deletion. Results of the benchmark comparing tip (47b0c076e17d which includes my latest optimization on deletion) and 6a347c0ffbfc + translate_cached_2.patch. Common platform: Platform of campaign remove: Platform of campaign cache: ---------------------------+-------------+---------------- You patch is always slower for the common case (ASCII => ASCII translation). I implemented the most obvious optimization for the most common case (ASCII 1:1 and ASCII 1:1 with deletion). I consider that the current code is enough to close this issue. @josh Rosenberg: Thanks for the report. The current implementation should be almost as fast as bytes.translate() (the "60x" factor you mentionned in the title) for ASCII 1:1 mapping. -- Serhiy: If you are interested to optimize str.translate() for the general case (larger charset), please open a new issue. It will probably require more complex "cache". You may take a look at charmap codec which has such more complex cache (cache with 3 levels), see my message msg215301. IMO it's not interesting to invest time on optimizing str.translate(), it's not a common function. It took some years before an user run a benchmark on it :-) |
New changeset e56c71c6b45e by Victor Stinner in branch 'default': |
субота, 05-кві-2014 12:49:22 ви написали:
Because str.translate() is much slower than a series of str.replace() (which
In most (if not all) these cases input string can be non-ASCII.
Could you please provide bench_translate.py?
I were going to do this on next step. Full cache can grow up to 1114112 You commit too fast, I am late for you. ;) |
Do you plan to backport bug fixes to 3.3 and 2.7? |
Thanks for addressing this so fast. Annoyingly, I suspect it will not help the original case that led me to finding the slowdown (I had some code that was translating from 56 latin-1 Romance characters with diacritics to the equivalent ASCII characters, so it was 1-1, but it was always mapping from non-ASCII to ASCII, and therefore won't benefit from a change that only caches code points 0-127). That said, every *other* time I've used str.translate has been in an ASCII->ASCII scenario, where this *will* help. So again, thank you. |
Oh, I forgot to add it :-( |
Which commit? 3.3 is no more open to bug fixes, only to security fixes.
Python 2.7 is not affected by this issue. |
I opened the issue bpo-21165 to discuss a more generic optimization of str.translate(). |
A fix of _PyUnicode_TranslateCharmap().
Oh, I meant 3.4. |
Sorry, I don't see which commit is a bugfix and is not backported to Python 3.4 yet. |
New changeset cb632988bc09 by Victor Stinner in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: