This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Optimize ASCII/latin1 encoder with surrogateescape error handlers
Type: performance Stage:
Components: Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: methane, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2015-09-24 12:48 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
encode_ucs1_surrogateescape.patch vstinner, 2015-09-24 12:55 review
encode_ucs1_surrogateescape-2.patch vstinner, 2015-09-24 14:15 review
bench.py vstinner, 2015-09-24 14:16
Messages (6)
msg251516 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-24 12:48
Attached patch is based on faster_surrogates_hadling.patch written by Serhiy Storchaka for the issue #24870. It optimizes str.encode('ascii', 'surrogateescape') and str.encode('ascii', 'latin1').
msg251518 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-24 12:54
New changeset fa65c32d7134 by Victor Stinner in branch 'default':
Issue #25227: Cleanup unicode_encode_ucs1() error handler
https://hg.python.org/cpython/rev/fa65c32d7134
msg251525 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-24 14:15
Updated test now with more unit tests.
msg251526 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-24 14:16
Result of a micro-benchmark with encode_ucs1_surrogateescape-2.patch.

Common platform:
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
Timer: time.perf_counter
CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Python unicode implementation: PEP 393
Platform: Linux-4.1.6-200.fc22.x86_64-x86_64-with-fedora-22-Twenty_Two
CFLAGS: -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
Bits: int=32, long=64, long long=64, size_t=64, void*=64

Platform of campaign before:
Date: 2015-09-24 16:12:35
Timer precision: 54 ns
Python version: 3.6.0a0 (default:fa65c32d7134, Sep 24 2015, 16:11:44) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
SCM: hg revision=fa65c32d7134 tag=tip branch=default date="2015-09-24 14:45 +0200"

Platform of campaign after:
Python version: 3.6.0a0 (default:fa65c32d7134+, Sep 24 2015, 16:13:20) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
Timer precision: 55 ns
SCM: hg revision=fa65c32d7134+ tag=tip branch=default date="2015-09-24 14:45 +0200"
Date: 2015-09-24 16:13:21

-----------------------+-------------+---------------
ascii                  |      before |          after
-----------------------+-------------+---------------
100 x 10**1 characters | 6.65 us (*) | 1.93 us (-71%)
100 x 10**3 characters |  512 us (*) |  158 us (-69%)
100 x 10**2 characters | 52.2 us (*) | 16.2 us (-69%)
100 x 10**4 characters | 5.09 ms (*) | 1.59 ms (-69%)
-----------------------+-------------+---------------
Total                  | 5.66 ms (*) | 1.77 ms (-69%)
-----------------------+-------------+---------------

-----------------------+-------------+---------------
latin1                 |      before |          after
-----------------------+-------------+---------------
100 x 10**1 characters | 6.24 us (*) | 1.89 us (-70%)
100 x 10**3 characters |  500 us (*) |  160 us (-68%)
100 x 10**2 characters |   51 us (*) | 16.3 us (-68%)
100 x 10**4 characters |    5 ms (*) | 1.59 ms (-68%)
-----------------------+-------------+---------------
Total                  | 5.56 ms (*) | 1.77 ms (-68%)
-----------------------+-------------+---------------

--------+-------------+---------------
Summary |      before |          after
--------+-------------+---------------
ascii   | 5.66 ms (*) | 1.77 ms (-69%)
latin1  | 5.56 ms (*) | 1.77 ms (-68%)
--------+-------------+---------------
Total   | 11.2 ms (*) | 3.53 ms (-69%)
--------+-------------+---------------
msg251841 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-29 10:35
New changeset 128a3f03ddeb by Victor Stinner in branch 'default':
Optimize ascii/latin1+surrogateescape encoders
https://hg.python.org/cpython/rev/128a3f03ddeb
msg251843 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-29 10:39
INADA Naoki: The ASCII and latin1 encoders are now up to 3 times as fast when the surrogateescape error handler is used in Python 3.6.
History
Date User Action Args
2022-04-11 14:58:21adminsetgithub: 69414
2015-09-29 10:39:51vstinnersetstatus: open -> closed

nosy: + methane, serhiy.storchaka
messages: + msg251843

resolution: fixed
2015-09-29 10:35:12python-devsetmessages: + msg251841
2015-09-24 14:16:18vstinnersetfiles: + bench.py

messages: + msg251526
2015-09-24 14:15:38vstinnersetfiles: + encode_ucs1_surrogateescape-2.patch

messages: + msg251525
2015-09-24 12:56:00vstinnersetfiles: + encode_ucs1_surrogateescape.patch
keywords: + patch
2015-09-24 12:54:23python-devsetnosy: + python-dev
messages: + msg251518
2015-09-24 12:48:03vstinnercreate