Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster utf-8 decoding #58859

Closed
serhiy-storchaka opened this issue Apr 23, 2012 · 19 comments
Closed

Faster utf-8 decoding #58859

serhiy-storchaka opened this issue Apr 23, 2012 · 19 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@serhiy-storchaka
Copy link
Member

BPO 14654
Nosy @loewis, @jcea, @pitrou, @vstinner, @merwok, @serhiy-storchaka
Files
  • decode_utf8.patch
  • utf8-signed.diff: Non-recomended hacking
  • decode_utf8_2.patch
  • decode_utf8_3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-05-05.19:02:13.732>
    created_at = <Date 2012-04-23.21:04:06.506>
    labels = ['interpreter-core', 'performance']
    title = 'Faster utf-8 decoding'
    updated_at = <Date 2012-05-08.05:02:12.346>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2012-05-08.05:02:12.346>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-05-05.19:02:13.732>
    closer = 'loewis'
    components = ['Interpreter Core']
    creation = <Date 2012-04-23.21:04:06.506>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['25326', '25338', '25342', '25343']
    hgrepos = []
    issue_num = 14654
    keywords = ['patch']
    message_count = 19.0
    messages = ['159080', '159082', '159085', '159086', '159124', '159132', '159134', '159174', '159907', '159908', '159910', '159944', '160011', '160020', '160025', '160043', '160082', '160102', '160185']
    nosy_count = 7.0
    nosy_names = ['loewis', 'jcea', 'pitrou', 'vstinner', 'eric.araujo', 'Arfrever', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue14654'
    versions = ['Python 3.3']

    @serhiy-storchaka
    Copy link
    Member Author

    The utf-8 decoder is already well optimized. I propose a patch, which accelerates the utf-8 decoder for some of the frequent cases even more (+10-30%). In particular, for 2-bites non-latin1 codes will get about +30%.

    This is not the final result of optimization. It may be possible to optimize the decoding of the ascii and mostly-ascii text (up to the speed of memcpy), decoding of text with occasional errors, reduce code duplication. But I'm not sure of the success.

    Related issues:
    [bpo-4868] Faster utf-8 decoding
    [bpo-13417] faster utf-8 decoding
    [bpo-14419] Faster ascii decoding
    [bpo-14624] Faster utf-16 decoder
    [bpo-14625] Faster utf-32 decoder

    @serhiy-storchaka serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Apr 23, 2012
    @serhiy-storchaka
    Copy link
    Member Author

    Here are the results of benchmarking (numbers in MB/s).

    On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz:

                                          Py2.7         Py3.2         Py3.3        patch
    

    utf-8 'A'*10000 191 (+790%) 1170 (+45%) 1664 (+2%) 1700
    utf-8 '\x80'*10000 187 (+4%) 219 (-11%) 172 (+13%) 194
    utf-8 '\x80'+'A'*9999 191 (+98%) 1152 (-67%) 376 (+1%) 378
    utf-8 '\u0100'*10000 188 (+15%) 221 (-2%) 164 (+32%) 217
    utf-8 '\u0100'+'A'*9999 191 (+103%) 1150 (-66%) 382 (+1%) 387
    utf-8 '\u0100'+'\x80'*9999 188 (+15%) 221 (-2%) 164 (+32%) 217
    utf-8 '\u8000'*10000 244 (-12%) 263 (-18%) 191 (+13%) 215
    utf-8 '\u8000'+'A'*9999 191 (+102%) 1174 (-67%) 382 (+1%) 386
    utf-8 '\u8000'+'\x80'*9999 188 (+15%) 216 (+0%) 164 (+32%) 217
    utf-8 '\u8000'+'\u0100'*9999 188 (+15%) 216 (+0%) 164 (+32%) 217
    utf-8 '\U00010000'*10000 251 (-15%) 248 (-14%) 199 (+7%) 213
    utf-8 '\U00010000'+'A'*9999 191 (+97%) 1173 (-68%) 372 (+1%) 376
    utf-8 '\U00010000'+'\x80'*9999 188 (+21%) 221 (+3%) 180 (+26%) 227
    utf-8 '\U00010000'+'\u0100'*9999 188 (+21%) 221 (+3%) 180 (+26%) 227
    utf-8 '\U00010000'+'\u8000'*9999 244 (-9%) 263 (-16%) 201 (+10%) 221

    On 32-bit Linux, Intel Atom N570 @ 1.66GHz:

                                          Py2.7         Py3.2         Py3.3        patch
    

    utf-8 'A'*10000 117 (+414%) 349 (+72%) 597 (+1%) 601
    utf-8 '\x80'*10000 86 (-5%) 89 (-8%) 67 (+22%) 82
    utf-8 '\x80'+'A'*9999 117 (+6%) 340 (-64%) 126 (-2%) 124
    utf-8 '\u0100'*10000 86 (-2%) 89 (-6%) 66 (+27%) 84
    utf-8 '\u0100'+'A'*9999 117 (+5%) 339 (-64%) 78 (+58%) 123
    utf-8 '\u0100'+'\x80'*9999 86 (-2%) 89 (-6%) 66 (+27%) 84
    utf-8 '\u8000'*10000 109 (-26%) 98 (-17%) 71 (+14%) 81
    utf-8 '\u8000'+'A'*9999 116 (+7%) 339 (-63%) 78 (+59%) 124
    utf-8 '\u8000'+'\x80'*9999 86 (-3%) 89 (-7%) 66 (+26%) 83
    utf-8 '\u8000'+'\u0100'*9999 86 (-3%) 89 (-7%) 66 (+26%) 83
    utf-8 '\U00010000'*10000 106 (-14%) 105 (-13%) 81 (+12%) 91
    utf-8 '\U00010000'+'A'*9999 116 (+12%) 338 (-62%) 127 (+2%) 130
    utf-8 '\U00010000'+'\x80'*9999 86 (+6%) 88 (+3%) 69 (+32%) 91
    utf-8 '\U00010000'+'\u0100'*9999 86 (+6%) 88 (+3%) 69 (+32%) 91
    utf-8 '\U00010000'+'\u8000'*9999 109 (-24%) 98 (-15%) 74 (+12%) 83

    The results were ambiguous (everywhere plus, but in different ways). I
    would like to see the results for 64-bit platforms. For scripts see
    bpo-14624.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 23, 2012

    64-bit Linux, Intel Core i5-2500K CPU @ 3.30GHz:

                                          vanilla 3.3   patched
    

    utf-8 'A'*10000 6668 (+7%) 7145
    utf-8 'A'*9999+'\x80' 2358 (+3%) 2418
    utf-8 'A'*9999+'\u0100' 2306 (+0%) 2311
    utf-8 'A'*9999+'\u8000' 2299 (+0%) 2309
    utf-8 'A'*9999+'\U00010000' 2373 (-4%) 2278
    utf-8 '\x80'*10000 366 (+53%) 559
    utf-8 '\x80'+'A'*9999 859 (+1%) 868
    utf-8 '\x80'*9999+'\u0100' 529 (+5%) 558
    utf-8 '\x80'*9999+'\u8000' 529 (+5%) 558
    utf-8 '\x80'*9999+'\U00010000' 529 (+5%) 558
    utf-8 '\u0100'*10000 520 (+6%) 549
    utf-8 '\u0100'+'A'*9999 822 (+0%) 823
    utf-8 '\u0100'+'\x80'*9999 519 (+6%) 549
    utf-8 '\u0100'*9999+'\u8000' 520 (+6%) 549
    utf-8 '\u0100'*9999+'\U00010000' 520 (+6%) 549
    utf-8 '\u8000'*10000 470 (+4%) 491
    utf-8 '\u8000'+'A'*9999 822 (+0%) 822
    utf-8 '\u8000'+'\x80'*9999 509 (+8%) 549
    utf-8 '\u8000'+'\u0100'*9999 509 (+8%) 549
    utf-8 '\u8000'*9999+'\U00010000' 470 (-4%) 451
    utf-8 '\U00010000'*10000 483 (-6%) 453
    utf-8 '\U00010000'+'A'*9999 938 (-1%) 926
    utf-8 '\U00010000'+'\x80'*9999 561 (+6%) 595
    utf-8 '\U00010000'+'\u0100'*9999 561 (+6%) 595
    utf-8 '\U00010000'+'\u8000'*9999 503 (-4%) 482

    @vstinner
    Copy link
    Member

    64-bit Linux, Intel Core i5-2500K CPU @ 3.30GHz: (...)

    Hum, the patch doesn't look very interesting if it only optimize one
    specific case:

    utf-8     '\x80'*10000                    366 (+53%)    559

    @serhiy-storchaka
    Copy link
    Member Author

    Thank you, Antoine. It is interesting results, that on 64 bits greatly
    accelerated the case, which on 32 bits sped up a little. It was the
    pathology that a 2-byte to UCS1 was decoded in 1.5x slower than a 2-byte
    to UCS2. Interestingly, a small acceleration for the other cases are
    random deviations or consequential effect? Strange looks like the
    difference for ascii-only text, this branch is not affected by the
    patch. Except that the consequences of global optimization. The
    deceleration of the decoding of the 4-byte data is expected.

    Here is a patch, which is risky reception with signed numbers. For me,
    it shows the acceleration of a few percent in comparison with the
    previous patch. But I can not recommend it, it looks too hacker for such
    a small improvement. It will not work on the exotic platforms where
    signed numbers are implemented not as complement code (but Python is not
    supports such platforms).

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Apr 24, 2012

    I'm -1 on using signed char in the implementation. If this gives any advantage, it's because the compiler is not able to generate as efficient code for unsigned char as it does for signed char. So the performance results may again change if you switch compilers, or use the next compiler version.

    The code should do what is *logically* correct; IMO, UTF-8 is really a sequence of unsigned bytes, conceptually.

    So if you want to demonstrate any performance improvements, you need to do so with unsigned chars.

    @serhiy-storchaka
    Copy link
    Member Author

    I'm -1 on using signed char in the implementation.

    I completely agree with you, for these and for other not mentioned
    reasons. So I don't released this patch yesterday, and did not suggest
    it to accept. I showed him just out of curiosity -- whether the effect
    is stronger on a 64-bit platform? Although this technique will not be
    accepted, it sets the bar that can be achieved (if it's worth it).

    @serhiy-storchaka
    Copy link
    Member Author

    Here are two new patches. The first one takes into account the Martin
    wishes about comments. The second also rejects optimization for ASCII.

    On the Intel Atom last patch annihilates acceleration for some cases
    (mostly-ascii with UCS2 data):

                                          vanilla     patch1      patch3
    

    utf-8 'A'*9999+'\u0100' 124 (+8%) 288 (-53%) 134
    utf-8 'A'*9999+'\u8000' 124 (+8%) 291 (-54%) 134
    utf-8 '\u0100'+'A'*9999 78 (+5%) 123 (-33%) 82
    utf-8 '\u8000'+'A'*9999 78 (+5%) 124 (-34%) 82

    On the AMD Athlon there is no noticeable effect.

    @merwok merwok changed the title More fast utf-8 decoding Faster utf-8 decoding May 4, 2012
    @serhiy-storchaka
    Copy link
    Member Author

    title: More fast utf-8 decoding -> Faster utf-8 decoding

    Éric, there is already an issue (bpo-4868) with this title.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 4, 2012

    There is nothing wrong with two issues having the same title. Of course, it would be best if the title reflected the *actual* defect or change, such as "specialize UTF-8 decoding by character width", or some such.

    In any case, the title change is desirable since the original title was ungrammatical. If you wanted to point out that this really is an augmented, escalated rise, then "Even faster utf-8 decoded", "amazingly faster UTF-8 decoding", or "unbelievably faster utf-8 decoding" could have worked :-)

    @serhiy-storchaka
    Copy link
    Member Author

    Thank you, Martin, this is what I had in mind. Lost in translation. ;)

    @pitrou
    Copy link
    Member

    pitrou commented May 4, 2012

    64-bit Linux, Intel Core i5-2500K CPU @ 3.30GHz:

                                          vanilla 3.3   patch 2         patch 3
    

    utf-8 'A'*10000 6931 (+3%) 7115 (+0%) 7117
    utf-8 'A'*9999+'\x80' 2347 (+1%) 2410 (-2%) 2360
    utf-8 'A'*9999+'\u0100' 2279 (+1%) 2282 (+1%) 2310
    utf-8 'A'*9999+'\u8000' 2264 (+2%) 2275 (+1%) 2300
    utf-8 'A'*9999+'\U00010000' 2351 (+0%) 2283 (+3%) 2359
    utf-8 '\x80'*10000 516 (+8%) 558 (+0%) 559
    utf-8 '\x80'+'A'*9999 859 (+0%) 868 (-1%) 860
    utf-8 '\x80'*9999+'\u0100' 526 (+6%) 558 (+0%) 558
    utf-8 '\x80'*9999+'\u8000' 535 (+4%) 558 (+0%) 558
    utf-8 '\x80'*9999+'\U00010000' 525 (+6%) 559 (-0%) 558
    utf-8 '\u0100'*10000 517 (+6%) 548 (+0%) 548
    utf-8 '\u0100'+'A'*9999 818 (+0%) 820 (+0%) 821
    utf-8 '\u0100'+'\x80'*9999 517 (+6%) 548 (+0%) 548
    utf-8 '\u0100'*9999+'\u8000' 525 (+4%) 548 (+0%) 548
    utf-8 '\u0100'*9999+'\U00010000' 517 (+6%) 549 (+0%) 549
    utf-8 '\u8000'*10000 490 (-8%) 433 (+4%) 451
    utf-8 '\u8000'+'A'*9999 818 (+0%) 819 (+0%) 821
    utf-8 '\u8000'+'\x80'*9999 529 (+4%) 548 (+0%) 548
    utf-8 '\u8000'+'\u0100'*9999 529 (+4%) 548 (+0%) 548
    utf-8 '\u8000'*9999+'\U00010000' 470 (-4%) 451 (+0%) 451
    utf-8 '\U00010000'*10000 554 (-18%) 427 (+6%) 453
    utf-8 '\U00010000'+'A'*9999 938 (+0%) 927 (+2%) 941
    utf-8 '\U00010000'+'\x80'*9999 572 (+4%) 595 (+0%) 595
    utf-8 '\U00010000'+'\u0100'*9999 571 (+4%) 595 (+0%) 595
    utf-8 '\U00010000'+'\u8000'*9999 503 (-4%) 481 (+0%) 482

    @serhiy-storchaka
    Copy link
    Member Author

    Well, it seems, 64-bit processors are smart enough to not feel the need
    for this optimization. On 32-bit platforms I see a noticeable increase
    in speed.

    I am now working on a more advanced optimization, which now shows a gain
    of +20-60% compared with the previous patches, but I hope to increase a
    gain by +50%-100%.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 5, 2012

    I'll be closing this issue at this point. Serhiy: I don't think the bug tracker should be used to evolve work in progress (except when responding to reviews received). Use a Mercurial clone for that instead. By posting a patch here, you are requesting that it be reviewed and considered - please understand that you consume a lot of people's time by such a posting.

    At this point, it appears that you don't intend to submit any of these patches for inclusion into Python. If you ever do want to contribute something in this area, please create a new issue.

    @loewis loewis mannequin closed this as completed May 5, 2012
    @pitrou
    Copy link
    Member

    pitrou commented May 5, 2012

    I'll be closing this issue at this point. Serhiy: I don't think the
    bug tracker should be used to evolve work in progress (except when
    responding to reviews received). Use a Mercurial clone for that
    instead. By posting a patch here, you are requesting that it be
    reviewed and considered - please understand that you consume a lot of
    people's time by such a posting.

    That's not very nice. If Serhiy wants feedback on his work, he
    definitely has to post *somewhere*. The bug tracker sounds like a
    reasonable place (certainly more reasonable than python-dev).

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 5, 2012

    That's not very nice. If Serhiy wants feedback on his work, he
    definitely has to post *somewhere*. The bug tracker sounds like a
    reasonable place (certainly more reasonable than python-dev).

    I completely disagree (and I really tried to be nice).

    It is my utmost belief that the tracker must not be used for
    work-in-progress. For any open issue, numerous people review the
    issue, and even if they spend only a few minutes, this easily adds
    up to a lot of wasted time if there isn't anything to be done about
    an issue.

    OTOH, discussing it on python-dev indeed seems more appropriate:
    even though the readership is larger, people know that they can safely
    skip over messages that clearly don't need their attention. So if
    Serhiy posts a message titled "UTF-8 performance", people will hit
    the delete button very quickly if they are not interested.

    However, it would really be best in this case if Serhiy takes a step
    back, and analyzes the performance of the current decoder carefully,
    then proposes a patch which undoubtedly improves the performance and
    is meanwhile also maintainable.

    He may come to the conclusion that further improvement isn't really
    possible or reasonable, in which case it would be good if he posted
    his findings to python-dev.

    @jcea
    Copy link
    Member

    jcea commented May 6, 2012

    I understand Martin point, but I think 95% of issues in the bugtracker are "work in progress", mine included.

    Maybe the issue is that Serhiy hasn't made a concrete proposal to be tested & integrated. It seems to be more an exploratory work.

    I am in the nosy list because I am interested in this work.

    @serhiy-storchaka
    Copy link
    Member Author

    Martin, sorry to have wasted your time. I understand that you are busy,
    so I'm not too worried not receiving a feedback for ten days.

    At this point, it appears that you don't intend to submit any of these patches for inclusion into Python.

    I'm at a loss. What causes such an impression? I quickly reacting to the
    comments, responding by the new patches. I take into account your
    comments, and if I do not agree, reinforce my opinion by the benchmarks.
    I suggest only cleaned and well-tested code. I provide tools for
    benchmarking. What am I doing wrong? May be my bad English has been
    misunderstood?

    @serhiy-storchaka
    Copy link
    Member Author

    See bpo-14738 for advanced optimization.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants