msg161652 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-05-26 09:11 |
As strange as it may seem, but using a simple trick was made UTF-8 decoding even more speed up.
Here are the benchmark results.
On 32-bit Linux, AMD Athlon 64 X2:
vanilla patched
utf-8 'A'*10000 2061 (+3%) 2115
utf-8 '\x80'*10000 383 (-7%) 355
utf-8 '\x80'+'A'*9999 1273 (+1%) 1290
utf-8 '\u0100'*10000 382 (+47%) 562
utf-8 '\u0100'+'A'*9999 1239 (+1%) 1253
utf-8 '\u0100'+'\x80'*9999 383 (+47%) 562
utf-8 '\u8000'*10000 434 (-6%) 409
utf-8 '\u8000'+'A'*9999 1245 (+1%) 1256
utf-8 '\u8000'+'\x80'*9999 382 (+47%) 560
utf-8 '\u8000'+'\u0100'*9999 383 (+44%) 553
utf-8 '\U00010000'*10000 358 (+4%) 373
utf-8 '\U00010000'+'A'*9999 1171 (+0%) 1176
utf-8 '\U00010000'+'\x80'*9999 381 (+44%) 548
utf-8 '\U00010000'+'\u0100'*9999 381 (+44%) 548
utf-8 '\U00010000'+'\u8000'*9999 404 (+0%) 406
On 32-bit Linux, Intel Atom N570:
vanilla patched
utf-8 'A'*10000 623 (+0%) 626
utf-8 '\x80'*10000 145 (+15%) 167
utf-8 '\x80'+'A'*9999 354 (+2%) 362
utf-8 '\u0100'*10000 164 (+10%) 181
utf-8 '\u0100'+'A'*9999 343 (-0%) 342
utf-8 '\u0100'+'\x80'*9999 164 (+11%) 182
utf-8 '\u8000'*10000 175 (+5%) 183
utf-8 '\u8000'+'A'*9999 349 (+0%) 349
utf-8 '\u8000'+'\x80'*9999 164 (+11%) 182
utf-8 '\u8000'+'\u0100'*9999 164 (+10%) 181
utf-8 '\U00010000'*10000 152 (+11%) 168
utf-8 '\U00010000'+'A'*9999 313 (+0%) 313
utf-8 '\U00010000'+'\x80'*9999 161 (+11%) 179
utf-8 '\U00010000'+'\u0100'*9999 161 (+11%) 179
utf-8 '\U00010000'+'\u8000'*9999 160 (+11%) 177
|
msg161653 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-05-26 09:19 |
I see a slight increase under 64-bit Linux with gcc 4.5.2, too:
vanilla patched
utf-8 'A'*10000 7857 (+4%) 8210
utf-8 'A'*9999+'\x80' 5392 (+8%) 5843
utf-8 'A'*9999+'\u0100' 2119 (+3%) 2173
utf-8 'A'*9999+'\u8000' 2121 (+2%) 2172
utf-8 'A'*9999+'\U00010000' 2248 (+2%) 2293
utf-8 '\x80'*10000 1015 (+1%) 1021
utf-8 '\x80'+'A'*9999 2747 (+5%) 2877
utf-8 '\x80'*9999+'\u0100' 868 (+0%) 869
utf-8 '\x80'*9999+'\u8000' 857 (+2%) 870
utf-8 '\x80'*9999+'\U00010000' 877 (+0%) 881
utf-8 '\u0100'*10000 1016 (+16%) 1181
utf-8 '\u0100'+'A'*9999 2506 (+3%) 2592
utf-8 '\u0100'+'\x80'*9999 1015 (+16%) 1179
utf-8 '\u0100'*9999+'\u8000' 1015 (+16%) 1182
utf-8 '\u0100'*9999+'\U00010000' 875 (+13%) 992
utf-8 '\u8000'*10000 836 (+18%) 985
utf-8 '\u8000'+'A'*9999 2508 (+3%) 2588
utf-8 '\u8000'+'\x80'*9999 1015 (+16%) 1182
utf-8 '\u8000'+'\u0100'*9999 1014 (+17%) 1182
utf-8 '\u8000'*9999+'\U00010000' 767 (+17%) 894
utf-8 '\U00010000'*10000 730 (+0%) 732
utf-8 '\U00010000'+'A'*9999 2542 (+2%) 2599
utf-8 '\U00010000'+'\x80'*9999 1013 (+17%) 1182
utf-8 '\U00010000'+'\u0100'*9999 1013 (+17%) 1181
utf-8 '\U00010000'+'\u8000'*9999 727 (+0%) 728
|
msg161654 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-05-26 09:20 |
It seems the patch relies on a two's complement representation of integers. Mark, do you think that's ok?
|
msg161656 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-05-26 09:52 |
> It seems the patch relies on a two's complement representation of integers. Mark, do you think that's ok?
Yes, the patch depends on two facts -- 8-bit bytes and a two's
complement representation of integers. That's why I call it a trick.
However, today CPython will not work on other platforms. However, we can
wrap macro definition in #if/#else/#end and provide the traditional form
(but I don't remember how to test a two's complement representation in
compile time).
|
msg161657 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-05-26 10:02 |
The C standard says, in 6.3.1.3/3
Otherwise [*], the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
[*]: the value cannot be exactly converted, and the target type is not unsigned.
We shouldn't be using unsigned->signed conversions where the source value is out of range for the signed type.
|
msg161711 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-05-27 15:49 |
Yes, this is an implementation-dependent behavior (and on the supported platforms it is implemented well in a certain way).
However, if the continuation byte check to do the simplest way ((ch) >= 0x80 && (ch) < 0xC0), this has the same effect (speed up to +45%) on AMD Athlon.
vanilla patched
utf-8 'A'*10000 2061 (-2%) 2018
utf-8 '\x80'*10000 383 (+9%) 416
utf-8 '\x80'+'A'*9999 1273 (+3%) 1315
utf-8 '\u0100'*10000 382 (+46%) 558
utf-8 '\u0100'+'A'*9999 1239 (+0%) 1245
utf-8 '\u0100'+'\x80'*9999 383 (+46%) 558
utf-8 '\u8000'*10000 434 (-6%) 408
utf-8 '\u8000'+'A'*9999 1245 (+0%) 1245
utf-8 '\u8000'+'\x80'*9999 382 (+46%) 556
utf-8 '\u8000'+'\u0100'*9999 383 (+45%) 556
utf-8 '\U00010000'*10000 358 (+0%) 359
utf-8 '\U00010000'+'A'*9999 1171 (-0%) 1170
utf-8 '\U00010000'+'\x80'*9999 381 (+30%) 495
utf-8 '\U00010000'+'\u0100'*9999 381 (+30%) 495
utf-8 '\U00010000'+'\u8000'*9999 404 (-5%) 385
On Intel Atom the results did not change or become a little better.
vanilla patched
utf-8 'A'*10000 623 (+3%) 642
utf-8 '\x80'*10000 145 (+9%) 158
utf-8 '\x80'+'A'*9999 354 (+4%) 367
utf-8 '\u0100'*10000 164 (+0%) 164
utf-8 '\u0100'+'A'*9999 343 (+2%) 351
utf-8 '\u0100'+'\x80'*9999 164 (+1%) 165
utf-8 '\u8000'*10000 175 (-2%) 171
utf-8 '\u8000'+'A'*9999 349 (+3%) 359
utf-8 '\u8000'+'\x80'*9999 164 (+0%) 164
utf-8 '\u8000'+'\u0100'*9999 164 (+0%) 164
utf-8 '\U00010000'*10000 152 (-1%) 150
utf-8 '\U00010000'+'A'*9999 313 (+2%) 319
utf-8 '\U00010000'+'\x80'*9999 161 (+1%) 162
utf-8 '\U00010000'+'\u0100'*9999 161 (+1%) 162
utf-8 '\U00010000'+'\u8000'*9999 160 (-2%) 156
|
msg161715 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-05-27 18:37 |
> However, if the continuation byte check to do the simplest way ((ch) >=
> 0x80 && (ch) < 0xC0), this has the same effect (speed up to +45%) on
> AMD Athlon.
Doesn't produce any significant speedup on Intel Core i5-2500.
|
msg161759 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2012-05-28 06:42 |
> It seems the patch relies on a two's complement representation of
> integers. Mark, do you think that's ok?
(1) Relying on two's complement integers seems fine to me: we're already relying on it in other places in Python (e.g., bitwise operations for ints in Python 2.x); it seems unlikely Python's going to run into current or future hardware that uses anything else; and any special-case code for ones' complement or sign-magnitude is going to be essentially unused and awkward to test, so it's probably better not to have it in the codebase at all.
In an ideal world, I guess we'd add some configure-time tests for two's complement so that in the unlikely event that Python *does* meet a non two's complement platform the build fails early with a clear message rather than the tests failing in strange ways.
(2) The bit that Martin identifies: relying on conversion from unsigned to signed doing a wraparound modulo 2**<suitable n> is a bit more troubling; it's something that I try to avoid where possible, but that's not always easy. I don't recall ever having encountered this causing problems in practice, though---it feels like a leftover from a non-two's complement world where processors would have a hard time giving wraparound semantics, so the C standard didn't want to require it. Again, if we're going to rely on this, it would probably make sense to have some configure-time checks; and it would be better not to rely on it at all without a really good reason.
|
msg163501 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-06-22 22:31 |
Here is a patch that uses some sort of autodetection.
|
msg163583 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-06-23 11:30 |
Any chance to commit the patch before final feature freeze?
|
msg163586 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-06-23 11:36 |
> Any chance to commit the patch before final feature freeze?
I'll defer to Mark :-)
|
msg163589 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2012-06-23 11:43 |
Okay, will look at this this afternoon.
|
msg163607 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2012-06-23 13:33 |
I'm happy to apply the 'decode_utf8_range_check.patch'; I'll do that unless there are objections. The code is clearer than the original, and if we get a speedup into the bargain then I don't see a reason not to apply this.
I'm less comfortable with either the original patch, or the most recent one (decode_utf8_signed_byte-2.patch).
|
msg163611 - (view) |
Author: Ezio Melotti (ezio.melotti) * |
Date: 2012-06-23 13:51 |
Serhiy, does this patch also fix #8271?
If so, can you also include the tests I wrote for it?
|
msg163672 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2012-06-23 20:45 |
New changeset 3214c9ebcf5e by Mark Dickinson in branch 'default':
Issue #14923: Optimize continuation-byte check in UTF-8 decoding. Patch by Serhiy Storchaka.
http://hg.python.org/cpython/rev/3214c9ebcf5e
|
msg163673 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2012-06-23 20:47 |
Patch applied. Closing.
Ezio: the patch is pure optimization, with no change in semantics; I don't see how it could fix #8271.
|
msg163675 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-06-23 21:10 |
> Serhiy, does this patch also fix #8271?
No, this patch not change behavior. But updated patch for issue 8271 now
contains this patch (I hope this will help merge).
> If so, can you also include the tests I wrote for it?
Your tests included in patch for issue 8271.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:30 | admin | set | github: 59128 |
2012-06-23 21:10:43 | serhiy.storchaka | set | messages:
+ msg163675 |
2012-06-23 20:47:20 | mark.dickinson | set | status: open -> closed resolution: fixed messages:
+ msg163673
|
2012-06-23 20:45:26 | python-dev | set | messages:
+ msg163672 |
2012-06-23 13:51:00 | ezio.melotti | set | messages:
+ msg163611 |
2012-06-23 13:33:24 | mark.dickinson | set | messages:
+ msg163607 |
2012-06-23 11:43:32 | mark.dickinson | set | assignee: mark.dickinson messages:
+ msg163589 |
2012-06-23 11:36:11 | pitrou | set | messages:
+ msg163586 |
2012-06-23 11:30:38 | serhiy.storchaka | set | messages:
+ msg163583 |
2012-06-22 22:31:57 | serhiy.storchaka | set | files:
+ decode_utf8_signed_byte-2.patch
messages:
+ msg163501 |
2012-05-28 06:42:50 | mark.dickinson | set | messages:
+ msg161759 |
2012-05-27 18:37:04 | pitrou | set | messages:
+ msg161715 |
2012-05-27 15:49:48 | serhiy.storchaka | set | files:
+ decode_utf8_range_check.patch
messages:
+ msg161711 |
2012-05-26 10:02:09 | loewis | set | messages:
+ msg161657 |
2012-05-26 09:52:09 | serhiy.storchaka | set | messages:
+ msg161656 |
2012-05-26 09:20:52 | pitrou | set | stage: commit review -> patch review |
2012-05-26 09:20:38 | pitrou | set | messages:
+ msg161654 stage: commit review |
2012-05-26 09:19:46 | pitrou | set | messages:
+ msg161653 |
2012-05-26 09:14:53 | serhiy.storchaka | set | files:
+ bench-diff.py |
2012-05-26 09:13:51 | serhiy.storchaka | set | files:
+ decodebench.py |
2012-05-26 09:11:07 | serhiy.storchaka | create | |