Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use _PyUnicodeWriter API in text decoders #60515

Closed
vstinner opened this issue Oct 24, 2012 · 9 comments
Closed

Use _PyUnicodeWriter API in text decoders #60515

vstinner opened this issue Oct 24, 2012 · 9 comments
Labels
performance Performance or resource usage

Comments

@vstinner
Copy link
Member

BPO 16311
Nosy @loewis, @vstinner, @serhiy-storchaka
Files
  • codecs_writer.patch
  • codecs_writer_2.patch
  • decodebench.res: Benchmark results
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-11-06.23:41:02.393>
    created_at = <Date 2012-10-24.18:38:21.359>
    labels = ['performance']
    title = 'Use _PyUnicodeWriter API in text decoders'
    updated_at = <Date 2012-11-07.22:53:54.106>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2012-11-07.22:53:54.106>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-11-06.23:41:02.393>
    closer = 'python-dev'
    components = []
    creation = <Date 2012-10-24.18:38:21.359>
    creator = 'vstinner'
    dependencies = []
    files = ['27697', '27807', '27808']
    hgrepos = []
    issue_num = 16311
    keywords = ['patch']
    message_count = 9.0
    messages = ['173695', '173697', '174171', '174238', '174273', '174275', '174293', '175034', '175129']
    nosy_count = 4.0
    nosy_names = ['loewis', 'vstinner', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue16311'
    versions = ['Python 3.4']

    @vstinner
    Copy link
    Member Author

    Attached patch modifies text decoders to use the _PyUnicodeWriter API to factorize the code. It removes unicode_widen() and unicode_putchar() functions.

    • Don't overallocate by default (except for "raw-unicode-escape" codec), enable overallocation on the first decode error (as done currently)
    • _PyUnicodeWriter_Prepare() only overallocates 25%, instead of 100%
      for unicode_decode_call_errorhandler()
    • Use _PyUnicodeWriter_Prepare() + PyUnicode_WRITE() (two macros)
      instead of unicode_putchar() (function)
    • _PyUnicodeWriter structures stores many useful fields, so we don't
      have to pass multiple parameters to functions, only the writer

    I wrote the patch to factorize the code, but it might be faster.

    @vstinner vstinner added the performance Performance or resource usage label Oct 24, 2012
    @serhiy-storchaka
    Copy link
    Member

    Soon I'll post a patch, which speeds up unicode-escape and raw-unicode-escape decoders to 1.5-3x. Also there are not yet reviewed patches for UTF-32 (bpo-14625) and charmap (bpo-14850) decoders. Will be merge conflicts.

    But I will review the patch.

    @vstinner
    Copy link
    Member Author

    "Soon I'll post a patch, which speeds up unicode-escape and raw-unicode-escape decoders to 1.5-3x. Also there are not yet reviewed patches for UTF-32 (bpo-14625) and charmap (bpo-14850) decoders. Will be merge conflicts."

    codecs_writer.patch doesn't change too much the core of decoders, but mostly the code before and after the loop, and error handling. You can still use PyUnicode_WRITE, PyUnicode_READ, memcpy(), etc.

    "But I will review the patch."

    If you review the patch, please check that how the buffer is allocated. It should not be overallocated by default, only on the first error. Overallocation can kill performances when it is not necessary (especially on Windows).

    @serhiy-storchaka
    Copy link
    Member

    I will do some experiments and review tomorrow.

    @serhiy-storchaka
    Copy link
    Member

    I updated the patch to resolve the conflict with bpo-14625.

    @serhiy-storchaka
    Copy link
    Member

    With the patch UTF-8 decoder 20% slower for some data. UTF-16 decoder 20% faster for some data and 20% slower for other data. UTF-32 decoder slower for many data (even after some optimization, naive code was up to 50% slower). Standard charmap decoder 10% slower. Only UTF-7, unicode-escape and raw-unicode-escape have become much faster (unicode-escape and raw-unicode-escape as with bpo-16334 patch).

    A well optimized decoders do not benefit from the _PyUnicodeWriter, only a slight slowdown. The patch requires some optimization (as for UTF-32 decoder) to reduce the negative effect. Non-optimized decoders will receive the great benefit.

    @vstinner
    Copy link
    Member Author

    I ran decodebench.py and bench-diff.py scripts from bpo-14624, I just
    replaced repeat=10 with repeat=100 to get more reliable numbers. I
    only see some performance regressions between -5% and -1%, but there
    are some speedup on UTF-8 and UTF-32 (between +11% and +14%). On a
    microbenchmark, numbers in the -10..10% range just means "no change".

    Using _PyUnicodeWriter should not change anything to performances on
    valid data, only performances of handling decoding errors between the
    overallocation factor is different, the code to widen the buffer and
    the code to write replacement characters.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 6, 2012

    New changeset 7ed9993d53b4 by Victor Stinner in branch 'default':
    Close bpo-16311: Use the _PyUnicodeWriter API in text decoders
    http://hg.python.org/cpython/rev/7ed9993d53b4

    @python-dev python-dev mannequin closed this as completed Nov 6, 2012
    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 7, 2012

    Oh, I forgot my benchmark results.

    decodebench.py result results on Linux 32 bits:
    (Linux-3.2.0-32-generic-pae-i686-with-debian-wheezy-sid)

    $ ./python bench-diff.py original writer
    ascii     'A'*10000                       4109 (-3%)    3974

    latin1 'A'*10000 3851 (-5%) 3644
    latin1 '\x80'*10000 14832 (-3%) 14430

    utf-8 'A'*10000 3747 (-4%) 3608
    utf-8 '\x80'*10000 976 (-2%) 961
    utf-8 '\u0100'*10000 974 (-2%) 959
    utf-8 '\u8000'*10000 804 (-14%) 694
    utf-8 '\U00010000'*10000 666 (-5%) 635

    utf-16le 'A'*10000 4154 (-1%) 4117
    utf-16le '\x80'*10000 4055 (-2%) 3988
    utf-16le '\u0100'*10000 4047 (-2%) 3974
    utf-16le '\u8000'*10000 917 (-1%) 912
    utf-16le '\U00010000'*10000 872 (-0%) 870

    utf-16be 'A'*10000 3218 (-1%) 3185
    utf-16be '\x80'*10000 3163 (-2%) 3114
    utf-16be '\u0100'*10000 2591 (-1%) 2556
    utf-16be '\u8000'*10000 979 (-1%) 974
    utf-16be '\U00010000'*10000 928 (-0%) 925

    utf-32le 'A'*10000 1681 (+12%) 1885
    utf-32le '\x80'*10000 1697 (+10%) 1865
    utf-32le '\u0100'*10000 2224 (+1%) 2254
    utf-32le '\u8000'*10000 2224 (+2%) 2269
    utf-32le '\U00010000'*10000 2234 (+1%) 2260

    utf-32be 'A'*10000 1685 (+11%) 1868
    utf-32be '\x80'*10000 1684 (+10%) 1860
    utf-32be '\u0100'*10000 2223 (+1%) 2253
    utf-32be '\u8000'*10000 2222 (+1%) 2255
    utf-32be '\U00010000'*10000 2243 (+1%) 2257

    decodebench.py result results on Linux 64 bits:
    (Linux-3.4.9-2.fc16.x86_64-x86_64-with-fedora-16-Verne)

    ascii 'A'*10000 10043 (+1%) 10144

    latin1 'A'*10000 8351 (-1%) 8258
    latin1 '\x80'*10000 19184 (+2%) 19560

    utf-8 'A'*10000 8083 (+5%) 8461
    utf-8 '\x80'*10000 982 (+1%) 993
    utf-8 '\u0100'*10000 984 (+1%) 992
    utf-8 '\u8000'*10000 806 (+31%) 1053
    utf-8 '\U00010000'*10000 639 (+12%) 718

    utf-16le 'A'*10000 5547 (-2%) 5422
    utf-16le '\x80'*10000 5205 (+1%) 5271
    utf-16le '\u0100'*10000 4900 (-4%) 4695
    utf-16le '\u8000'*10000 1062 (+9%) 1154
    utf-16le '\U00010000'*10000 1040 (+4%) 1078

    utf-16be 'A'*10000 5416 (-5%) 5157
    utf-16be '\x80'*10000 5077 (-1%) 5011
    utf-16be '\u0100'*10000 4261 (-1%) 4218
    utf-16be '\u8000'*10000 1146 (+0%) 1147
    utf-16be '\U00010000'*10000 1125 (-1%) 1119

    utf-32le 'A'*10000 1743 (+8%) 1880
    utf-32le '\x80'*10000 1751 (+5%) 1842
    utf-32le '\u0100'*10000 2114 (+29%) 2721
    utf-32le '\u8000'*10000 2120 (+28%) 2718
    utf-32le '\U00010000'*10000 2065 (+30%) 2690

    utf-32be 'A'*10000 1761 (+6%) 1860
    utf-32be '\x80'*10000 1749 (+6%) 1856
    utf-32be '\u0100'*10000 2101 (+29%) 2715
    utf-32be '\u8000'*10000 2083 (+30%) 2715
    utf-32be '\U00010000'*10000 2058 (+31%) 2689

    Most significant changes:

    • -14% to decode '\u8000'*10000 from UTF-8 on Linux 32 bits
    • +31% to decode '\u8000'*10000 from UTF-8 on Linux 32 bits
    • +28% to +31% to decode UCS-2 and UCS-4 characters from UTF-8 on Linux 32 bits

    @serhiy Storchaka: If you feel able to tune _PyUnicodeWriter to
    improve its performance, please open a new issue.

    I consider the performance changes acceptable and I don't plan to work
    on this topic.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants