This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients ezio.melotti, vstinner
Date 2015-10-04.08:30:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1443947432.21.0.263826556264.issue25301@psf.upfronthosting.co.za>
In-reply-to
Content
Results of the microbenchmark on the UTF-8 decoder.

As expected, performances on valid UTF-8 is unchanged, which was an important goal for me.

Decoding with error handlers optimized by the patch are *much* faster.

backslashreplace is still slow, because I didn't optimize it.

Common platform:
Python unicode implementation: PEP 393
Timer: time.perf_counter
Platform: Linux-4.1.5-200.fc22.x86_64-x86_64-with-fedora-22-Twenty_Two
CPU model: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
Bits: int=32, long=64, long long=64, size_t=64, void*=64
CFLAGS: -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
Timer precision: 55 ns

Platform of campaign before:
SCM: hg revision=f51921883f50 tag=tip branch=default date="2015-10-04 01:19 -0400"
Python version: 3.6.0a0 (default:f51921883f50, Oct 4 2015, 10:19:37) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
Date: 2015-10-04 10:19:44

Platform of campaign after:
SCM: hg revision=f51921883f50+ tag=tip branch=default date="2015-10-04 01:19 -0400"
Python version: 3.6.0a0 (default:f51921883f50+, Oct 4 2015, 10:14:05) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
Date: 2015-10-04 10:18:55

---------------------+-------------+--------
valid UTF-8 (strict) |      before |   after
---------------------+-------------+--------
100 x 10**1 bytes    |  297 ns (*) |  297 ns
100 x 10**3 bytes    |  7.4 us (*) | 7.44 us
100 x 10**2 bytes    |  929 ns (*) |  924 ns
100 x 10**4 bytes    | 80.4 us (*) | 80.4 us
---------------------+-------------+--------
Total                | 89.1 us (*) |   89 us
---------------------+-------------+--------

------------------+-------------+---------------
ignore            |      before |          after
------------------+-------------+---------------
100 x 10**1 bytes | 6.68 us (*) |  743 ns (-89%)
100 x 10**3 bytes |  561 us (*) | 42.6 us (-92%)
100 x 10**2 bytes | 56.8 us (*) | 4.55 us (-92%)
100 x 10**4 bytes | 6.02 ms (*) |  425 us (-93%)
------------------+-------------+---------------
Total             | 6.65 ms (*) |  473 us (-93%)
------------------+-------------+---------------

------------------+-------------+---------------
replace           |      before |          after
------------------+-------------+---------------
100 x 10**1 bytes | 7.61 us (*) |  890 ns (-88%)
100 x 10**3 bytes |  639 us (*) | 50.3 us (-92%)
100 x 10**2 bytes | 64.8 us (*) | 5.37 us (-92%)
100 x 10**4 bytes | 7.09 ms (*) |  505 us (-93%)
------------------+-------------+---------------
Total             | 7.81 ms (*) |  561 us (-93%)
------------------+-------------+---------------

------------------+-------------+---------------
surrogateescape   |      before |          after
------------------+-------------+---------------
100 x 10**1 bytes | 7.96 us (*) |  855 ns (-89%)
100 x 10**3 bytes |  674 us (*) | 50.2 us (-93%)
100 x 10**2 bytes | 68.8 us (*) | 5.35 us (-92%)
100 x 10**4 bytes | 7.38 ms (*) |  504 us (-93%)
------------------+-------------+---------------
Total             | 8.13 ms (*) |  560 us (-93%)
------------------+-------------+---------------

------------------+-------------+--------
backslashreplace  |      before |   after
------------------+-------------+--------
100 x 10**1 bytes | 7.66 us (*) | 7.89 us
100 x 10**3 bytes |  633 us (*) |  633 us
100 x 10**2 bytes | 64.1 us (*) | 64.6 us
100 x 10**4 bytes |  6.9 ms (*) | 6.93 ms
------------------+-------------+--------
Total             | 7.61 ms (*) | 7.64 ms
------------------+-------------+--------

---------------------+-------------+---------------
Summary              |      before |          after
---------------------+-------------+---------------
valid UTF-8 (strict) | 89.1 us (*) |          89 us
ignore               | 6.65 ms (*) |  473 us (-93%)
replace              | 7.81 ms (*) |  561 us (-93%)
surrogateescape      | 8.13 ms (*) |  560 us (-93%)
backslashreplace     | 7.61 ms (*) |        7.64 ms
---------------------+-------------+---------------
Total                | 30.3 ms (*) | 9.32 ms (-69%)
---------------------+-------------+---------------
History
Date User Action Args
2015-10-04 08:30:32vstinnersetrecipients: + vstinner, ezio.melotti
2015-10-04 08:30:32vstinnersetmessageid: <1443947432.21.0.263826556264.issue25301@psf.upfronthosting.co.za>
2015-10-04 08:30:32vstinnerlinkissue25301 messages
2015-10-04 08:30:31vstinnercreate