classification
Title: Faster utf-8 decoding
Type: performance Stage:
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, eric.araujo, jcea, loewis, pitrou, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2012-04-23 21:04 by serhiy.storchaka, last changed 2012-05-08 05:02 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
decode_utf8.patch serhiy.storchaka, 2012-04-23 21:04 review
utf8-signed.diff serhiy.storchaka, 2012-04-24 08:37 Non-recomended hacking review
decode_utf8_2.patch serhiy.storchaka, 2012-04-24 17:46 review
decode_utf8_3.patch serhiy.storchaka, 2012-04-24 17:46 review
Messages (19)
msg159080 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-04-23 21:04
The utf-8 decoder is already well optimized. I propose a patch, which accelerates the utf-8 decoder for some of the frequent cases even more (+10-30%). In particular, for 2-bites non-latin1 codes will get about +30%.

This is not the final result of optimization. It may be possible to optimize the decoding of the ascii and mostly-ascii text (up to the speed of memcpy), decoding of text with occasional errors, reduce code duplication. But I'm not sure of the success.

Related issues:
[issue4868] Faster utf-8 decoding
[issue13417] faster utf-8 decoding
[issue14419] Faster ascii decoding
[issue14624] Faster utf-16 decoder
[issue14625] Faster utf-32 decoder
msg159082 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-04-23 21:06
Here are the results of benchmarking (numbers in MB/s).

On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz:

                                          Py2.7         Py3.2         Py3.3        patch

utf-8     'A'*10000                       191 (+790%)   1170 (+45%)   1664 (+2%)    1700
utf-8     '\x80'*10000                    187 (+4%)     219 (-11%)    172 (+13%)    194
utf-8       '\x80'+'A'*9999               191 (+98%)    1152 (-67%)   376 (+1%)     378
utf-8     '\u0100'*10000                  188 (+15%)    221 (-2%)     164 (+32%)    217
utf-8       '\u0100'+'A'*9999             191 (+103%)   1150 (-66%)   382 (+1%)     387
utf-8       '\u0100'+'\x80'*9999          188 (+15%)    221 (-2%)     164 (+32%)    217
utf-8     '\u8000'*10000                  244 (-12%)    263 (-18%)    191 (+13%)    215
utf-8       '\u8000'+'A'*9999             191 (+102%)   1174 (-67%)   382 (+1%)     386
utf-8       '\u8000'+'\x80'*9999          188 (+15%)    216 (+0%)     164 (+32%)    217
utf-8       '\u8000'+'\u0100'*9999        188 (+15%)    216 (+0%)     164 (+32%)    217
utf-8     '\U00010000'*10000              251 (-15%)    248 (-14%)    199 (+7%)     213
utf-8       '\U00010000'+'A'*9999         191 (+97%)    1173 (-68%)   372 (+1%)     376
utf-8       '\U00010000'+'\x80'*9999      188 (+21%)    221 (+3%)     180 (+26%)    227
utf-8       '\U00010000'+'\u0100'*9999    188 (+21%)    221 (+3%)     180 (+26%)    227
utf-8       '\U00010000'+'\u8000'*9999    244 (-9%)     263 (-16%)    201 (+10%)    221

On 32-bit Linux, Intel Atom N570 @ 1.66GHz:

                                          Py2.7         Py3.2         Py3.3        patch

utf-8     'A'*10000                       117 (+414%)   349 (+72%)    597 (+1%)     601
utf-8     '\x80'*10000                    86 (-5%)      89 (-8%)      67 (+22%)     82
utf-8       '\x80'+'A'*9999               117 (+6%)     340 (-64%)    126 (-2%)     124
utf-8     '\u0100'*10000                  86 (-2%)      89 (-6%)      66 (+27%)     84
utf-8       '\u0100'+'A'*9999             117 (+5%)     339 (-64%)    78 (+58%)     123
utf-8       '\u0100'+'\x80'*9999          86 (-2%)      89 (-6%)      66 (+27%)     84
utf-8     '\u8000'*10000                  109 (-26%)    98 (-17%)     71 (+14%)     81
utf-8       '\u8000'+'A'*9999             116 (+7%)     339 (-63%)    78 (+59%)     124
utf-8       '\u8000'+'\x80'*9999          86 (-3%)      89 (-7%)      66 (+26%)     83
utf-8       '\u8000'+'\u0100'*9999        86 (-3%)      89 (-7%)      66 (+26%)     83
utf-8     '\U00010000'*10000              106 (-14%)    105 (-13%)    81 (+12%)     91
utf-8       '\U00010000'+'A'*9999         116 (+12%)    338 (-62%)    127 (+2%)     130
utf-8       '\U00010000'+'\x80'*9999      86 (+6%)      88 (+3%)      69 (+32%)     91
utf-8       '\U00010000'+'\u0100'*9999    86 (+6%)      88 (+3%)      69 (+32%)     91
utf-8       '\U00010000'+'\u8000'*9999    109 (-24%)    98 (-15%)     74 (+12%)     83

The results were ambiguous (everywhere plus, but in different ways). I
would like to see the results for 64-bit platforms. For scripts see
issue14624.
msg159085 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-23 21:23
64-bit Linux, Intel Core i5-2500K CPU @ 3.30GHz:

                                          vanilla 3.3   patched
utf-8     'A'*10000                       6668 (+7%)	7145
utf-8         'A'*9999+'\x80'             2358 (+3%)	2418
utf-8         'A'*9999+'\u0100'           2306 (+0%)	2311
utf-8         'A'*9999+'\u8000'           2299 (+0%)	2309
utf-8         'A'*9999+'\U00010000'       2373 (-4%)	2278
utf-8     '\x80'*10000                    366 (+53%)	559
utf-8       '\x80'+'A'*9999               859 (+1%)	868
utf-8         '\x80'*9999+'\u0100'        529 (+5%)	558
utf-8         '\x80'*9999+'\u8000'        529 (+5%)	558
utf-8         '\x80'*9999+'\U00010000'    529 (+5%)	558
utf-8     '\u0100'*10000                  520 (+6%)	549
utf-8       '\u0100'+'A'*9999             822 (+0%)	823
utf-8       '\u0100'+'\x80'*9999          519 (+6%)	549
utf-8         '\u0100'*9999+'\u8000'      520 (+6%)	549
utf-8         '\u0100'*9999+'\U00010000'  520 (+6%)	549
utf-8     '\u8000'*10000                  470 (+4%)	491
utf-8       '\u8000'+'A'*9999             822 (+0%)	822
utf-8       '\u8000'+'\x80'*9999          509 (+8%)	549
utf-8       '\u8000'+'\u0100'*9999        509 (+8%)	549
utf-8         '\u8000'*9999+'\U00010000'  470 (-4%)	451
utf-8     '\U00010000'*10000              483 (-6%)	453
utf-8       '\U00010000'+'A'*9999         938 (-1%)	926
utf-8       '\U00010000'+'\x80'*9999      561 (+6%)	595
utf-8       '\U00010000'+'\u0100'*9999    561 (+6%)	595
utf-8       '\U00010000'+'\u8000'*9999    503 (-4%)	482
msg159086 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-04-23 21:33
> 64-bit Linux, Intel Core i5-2500K CPU @ 3.30GHz: (...)

Hum, the patch doesn't look very interesting if it only optimize one
specific case:

> utf-8     '\x80'*10000                    366 (+53%)    559
msg159124 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-04-24 08:27
Thank you, Antoine. It is interesting results, that on 64 bits greatly
accelerated the case, which on 32 bits sped up a little. It was the
pathology that a 2-byte to UCS1 was decoded in 1.5x slower than a 2-byte
to UCS2. Interestingly, a small acceleration for the other cases are
random deviations or consequential effect? Strange looks like the
difference for ascii-only text, this branch is not affected by the
patch. Except that the consequences of global optimization. The
deceleration of the decoding of the 4-byte data is expected.

Here is a patch, which is risky reception with signed numbers. For me,
it shows the acceleration of a few percent in comparison with the
previous patch. But I can not recommend it, it looks too hacker for such
a small improvement. It will not work on the exotic platforms where
signed numbers are implemented not as complement code (but Python is not
supports such platforms).
msg159132 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-04-24 11:22
I'm -1 on using signed char in the implementation. If this gives any advantage, it's because the compiler is not able to generate as efficient code for unsigned char as it does for signed char. So the performance results may again change if you switch compilers, or use the next compiler version.

The code should do what is *logically* correct; IMO, UTF-8 is really a sequence of unsigned bytes, conceptually.

So if you want to demonstrate any performance improvements, you need to do so with unsigned chars.
msg159134 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-04-24 11:53
> I'm -1 on using signed char in the implementation.

I completely agree with you, for these and for other not mentioned
reasons. So I don't released this patch yesterday, and did not suggest
it to accept. I showed him just out of curiosity -- whether the effect
is stronger on a 64-bit platform? Although this technique will not be
accepted, it sets the bar that can be achieved (if it's worth it).
msg159174 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-04-24 17:46
Here are two new patches. The first one takes into account the Martin
wishes about comments. The second also rejects optimization for ASCII.

On the Intel Atom last patch annihilates acceleration for some cases
(mostly-ascii with UCS2 data):

                                          vanilla     patch1      patch3

utf-8         'A'*9999+'\u0100'           124 (+8%)   288 (-53%)  134
utf-8         'A'*9999+'\u8000'           124 (+8%)   291 (-54%)  134
utf-8       '\u0100'+'A'*9999             78 (+5%)    123 (-33%)  82
utf-8       '\u8000'+'A'*9999             78 (+5%)    124 (-34%)  82

On the AMD Athlon there is no noticeable effect.
msg159907 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-04 07:08
> title: More fast utf-8 decoding -> Faster utf-8 decoding

Éric, there is already an issue (#4868) with this title.
msg159908 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-05-04 07:24
There is nothing wrong with two issues having the same title. Of course, it would be best if the title reflected the *actual* defect or change, such as "specialize UTF-8 decoding by character width", or some such.

In any case, the title change is desirable since the original title was ungrammatical. If you wanted to point out that this really is an augmented, escalated rise, then "Even faster utf-8 decoded", "amazingly faster UTF-8 decoding", or "unbelievably faster utf-8 decoding" could have worked :-)
msg159910 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-04 07:59
Thank you, Martin, this is what I had in mind. Lost in translation. ;)
msg159944 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-05-04 16:49
64-bit Linux, Intel Core i5-2500K CPU @ 3.30GHz:

                                          vanilla 3.3   patch 2         patch 3

utf-8     'A'*10000                       6931 (+3%)	7115 (+0%)	7117
utf-8         'A'*9999+'\x80'             2347 (+1%)	2410 (-2%)	2360
utf-8         'A'*9999+'\u0100'           2279 (+1%)	2282 (+1%)	2310
utf-8         'A'*9999+'\u8000'           2264 (+2%)	2275 (+1%)	2300
utf-8         'A'*9999+'\U00010000'       2351 (+0%)	2283 (+3%)	2359
utf-8     '\x80'*10000                    516 (+8%)	558 (+0%)	559
utf-8       '\x80'+'A'*9999               859 (+0%)	868 (-1%)	860
utf-8         '\x80'*9999+'\u0100'        526 (+6%)	558 (+0%)	558
utf-8         '\x80'*9999+'\u8000'        535 (+4%)	558 (+0%)	558
utf-8         '\x80'*9999+'\U00010000'    525 (+6%)	559 (-0%)	558
utf-8     '\u0100'*10000                  517 (+6%)	548 (+0%)	548
utf-8       '\u0100'+'A'*9999             818 (+0%)	820 (+0%)	821
utf-8       '\u0100'+'\x80'*9999          517 (+6%)	548 (+0%)	548
utf-8         '\u0100'*9999+'\u8000'      525 (+4%)	548 (+0%)	548
utf-8         '\u0100'*9999+'\U00010000'  517 (+6%)	549 (+0%)	549
utf-8     '\u8000'*10000                  490 (-8%)	433 (+4%)	451
utf-8       '\u8000'+'A'*9999             818 (+0%)	819 (+0%)	821
utf-8       '\u8000'+'\x80'*9999          529 (+4%)	548 (+0%)	548
utf-8       '\u8000'+'\u0100'*9999        529 (+4%)	548 (+0%)	548
utf-8         '\u8000'*9999+'\U00010000'  470 (-4%)	451 (+0%)	451
utf-8     '\U00010000'*10000              554 (-18%)	427 (+6%)	453
utf-8       '\U00010000'+'A'*9999         938 (+0%)	927 (+2%)	941
utf-8       '\U00010000'+'\x80'*9999      572 (+4%)	595 (+0%)	595
utf-8       '\U00010000'+'\u0100'*9999    571 (+4%)	595 (+0%)	595
utf-8       '\U00010000'+'\u8000'*9999    503 (-4%)	481 (+0%)	482
msg160011 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-05 17:50
Well, it seems, 64-bit processors are smart enough to not feel the need
for this optimization. On 32-bit platforms I see a noticeable increase
in speed.

I am now working on a more advanced optimization, which now shows a gain
of +20-60% compared with the previous patches, but I hope to increase a
gain by +50%-100%.
msg160020 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-05-05 19:02
I'll be closing this issue at this point. Serhiy: I don't think the bug tracker should be used to evolve work in progress (except when responding to reviews received). Use a Mercurial clone for that instead. By posting a patch here, you are requesting that it be reviewed and considered - please understand that you consume a lot of people's time by such a posting.

At this point, it appears that you don't intend to submit any of these patches for inclusion into Python. If you ever do want to contribute something in this area, please create a new issue.
msg160025 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-05-05 19:19
> I'll be closing this issue at this point. Serhiy: I don't think the
> bug tracker should be used to evolve work in progress (except when
> responding to reviews received). Use a Mercurial clone for that
> instead. By posting a patch here, you are requesting that it be
> reviewed and considered - please understand that you consume a lot of
> people's time by such a posting.

That's not very nice. If Serhiy wants feedback on his work, he
definitely has to post *somewhere*. The bug tracker sounds like a
reasonable place (certainly more reasonable than python-dev).
msg160043 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-05-05 22:06
> That's not very nice. If Serhiy wants feedback on his work, he
> definitely has to post *somewhere*. The bug tracker sounds like a
> reasonable place (certainly more reasonable than python-dev).

I completely disagree (and I really tried to be nice).

It is my utmost belief that the tracker must not be used for
work-in-progress. For any open issue, numerous people review the
issue, and even if they spend only a few minutes, this easily adds
up to a lot of wasted time if there isn't anything to be done about
an issue.

OTOH, discussing it on python-dev indeed seems more appropriate:
even though the readership is larger, people know that they can safely
skip over messages that clearly don't need their attention. So if
Serhiy posts a message titled "UTF-8 performance", people will hit
the delete button very quickly if they are not interested.

However, it would really be best in this case if Serhiy takes a step
back, and analyzes the performance of the current decoder carefully,
then proposes a patch which undoubtedly improves the performance and
is meanwhile also maintainable.

He may come to the conclusion that further improvement isn't really
possible or reasonable, in which case it would be good if he posted
his findings to python-dev.
msg160082 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-05-06 13:37
I understand Martin point, but I think 95% of issues in the bugtracker are "work in progress", mine included.

Maybe the issue is that Serhiy hasn't made a concrete proposal to be tested & integrated. It seems to be more an exploratory work.

I am in the nosy list because I am interested in this work.
msg160102 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-06 17:51
Martin, sorry to have wasted your time. I understand that you are busy,
so I'm not too worried not receiving a feedback for ten days.

> At this point, it appears that you don't intend to submit any of these patches for inclusion into Python.

I'm at a loss. What causes such an impression? I quickly reacting to the
comments, responding by the new patches. I take into account your
comments, and if I do not agree, reinforce my opinion by the benchmarks.
I suggest only cleaned and well-tested code. I provide tools for
benchmarking. What am I doing wrong? May be my bad English has been
misunderstood?
msg160185 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-08 05:02
See issue14738 for advanced optimization.
History
Date User Action Args
2012-05-08 05:02:12serhiy.storchakasetmessages: + msg160185
2012-05-08 03:29:09eric.araujosetnosy: + eric.araujo
2012-05-06 17:51:16serhiy.storchakasetmessages: + msg160102
2012-05-06 13:37:27jceasetmessages: + msg160082
2012-05-05 22:06:18loewissetmessages: + msg160043
2012-05-05 19:19:54pitrousetmessages: + msg160025
2012-05-05 19:02:13loewissetstatus: open -> closed
resolution: rejected
messages: + msg160020
2012-05-05 17:50:47serhiy.storchakasetmessages: + msg160011
2012-05-04 16:49:54pitrousetmessages: + msg159944
2012-05-04 07:59:42serhiy.storchakasetmessages: + msg159910
2012-05-04 07:24:16loewissetmessages: + msg159908
2012-05-04 07:08:26serhiy.storchakasetmessages: + msg159907
2012-05-04 02:21:22eric.araujosettitle: More fast utf-8 decoding -> Faster utf-8 decoding
2012-04-24 17:46:16serhiy.storchakasetfiles: + decode_utf8_2.patch, decode_utf8_3.patch

messages: + msg159174
2012-04-24 11:53:59serhiy.storchakasetmessages: + msg159134
2012-04-24 11:22:06loewissetnosy: + loewis
messages: + msg159132
2012-04-24 08:37:51serhiy.storchakasetfiles: + utf8-signed.diff
2012-04-24 08:27:32serhiy.storchakasetmessages: + msg159124
2012-04-24 04:52:26Arfreversetnosy: + Arfrever
2012-04-23 21:59:38jceasetnosy: + jcea
2012-04-23 21:33:12vstinnersetmessages: + msg159086
2012-04-23 21:23:13pitrousetmessages: + msg159085
2012-04-23 21:06:57serhiy.storchakasetmessages: + msg159082
2012-04-23 21:04:06serhiy.storchakacreate