classification
Title: Faster charmap decoding
Type: performance Stage: resolved
Components: Interpreter Core, Unicode Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, haypo, lemburg, loewis, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2012-05-21 22:19 by serhiy.storchaka, last changed 2012-06-16 20:54 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
decode_charmap.patch serhiy.storchaka, 2012-05-21 22:19 review
charmapdecodebench.py serhiy.storchaka, 2012-05-21 22:20 Benchmark script
bench-diff.py serhiy.storchaka, 2012-05-21 22:21 Benchmark results comparator
Messages (3)
msg161301 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-21 22:19
Charmap decoders are not as important as UTF decoders, but are still widely used. In Python 3.3 with PEP 393 they slowed down 4x. The proposed patch restores the performance.

Optimized only the most common case, when the decoder is specified by the UCS2 table with length >= 256. Map-based decoders translated to table-based. UCS1 tables widened to UCS2 by adding 257th fake characters.

Benchmark results:

                             3.2           3.3(vanilla)  3.3(patched)

cp1251    'A'*10000          111 (+10%)    31 (+294%)    122
cp1251    '\xa0'*10000       111 (+8%)     29 (+314%)    120
cp1251    '\u0402'*10000     111 (+6%)     25 (+372%)    118
msg162989 - (view) Author: Roundup Robot (python-dev) Date: 2012-06-16 20:54
New changeset 8f3a5308f50b by Antoine Pitrou in branch 'default':
Issue #14874: Restore charmap decoding speed to pre-PEP 393 levels.
http://hg.python.org/cpython/rev/8f3a5308f50b
msg162990 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-16 20:54
Thank you for the patch! Now pushed to 3.3.
History
Date User Action Args
2012-06-16 20:54:32pitrousetstatus: open -> closed
resolution: fixed
messages: + msg162990

stage: patch review -> resolved
2012-06-16 20:54:06python-devsetnosy: + python-dev
messages: + msg162989
2012-06-16 16:44:22pitrousetnosy: + loewis
2012-06-15 17:51:10pitrousetstage: patch review
2012-05-21 22:21:38serhiy.storchakasetfiles: + bench-diff.py
2012-05-21 22:20:13serhiy.storchakasetfiles: + charmapdecodebench.py
2012-05-21 22:19:29serhiy.storchakacreate