This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: UTF-8 encoder performance regression in python3.3
Type: performance Stage:
Components: Unicode Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, flox, jcea, loewis, pitrou, python-dev, vstinner
Priority: normal Keywords: patch

Created on 2011-12-17 18:49 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
utf8_encoder-2.patch vstinner, 2011-12-18 12:20 review
utf8_encoder_prescan.patch vstinner, 2011-12-18 12:56 review
utf8_encoder-3.patch vstinner, 2011-12-18 13:06 review
Messages (12)
msg149695 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-17 18:49
iobench benchmarking tool showed that the UTF-8 encoder is slower in Python 3.3 than Python 3.2. The performance depends on the characters of the input string:

 * 8x faster (!) for a string of 50.000 ASCII characters
 * 1.5x slower for a string of 50.000 UCS-1 characters
 * 2.5x slower for a string of 50.000 UCS-2 characters

The bottleneck looks to be the the PyUnicode_READ() macro.

 * Python 3.2: s[i++]
 * Python 3.3: PyUnicode_READ(kind, data, i++)

Because encoding string to UTF-8 is a very common operation, performances do matter. Antoine suggests to have different versions of the function for each Unicode kind (1, 2, 4).
msg149699 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-12-17 19:50
Can you please provide your exact testing procedure? Standard iobench.py doesn't support testing for separate ASCII, UCS-1 and UCS-2 data, so you must have used some other tool. Exact code, command line parameters, hardware description and timing results would be appreciated.

Looking at the encoder, I think the first thing to change is to reduce the over-allocation for UCS-1 and UCS-2 strings. This may or may not help the run-time, but should reduce memory consumption.

I wonder whether making two passes over the string (one to compute the size, and the other one with an allocated result buffer) could improve the performance.

If there is further special-casing, I'd only special-case UCS-1. I doubt that the _READ() macro really is the bottleneck, and would rather expect that loop unrolling can help. Because of unallowed surrogates, unrolling is not practical for UCS-2.
msg149702 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-17 20:13
> Can you please provide your exact testing procedure?

Here you have.

$ cat bench.sh 
echo -n "ASCII: "
./python -m timeit 'x="A"*50000' 'x.encode("utf-8")'
echo -n "UCS-1: "
./python -m timeit 'x="\xe9"*50000' 'x.encode("utf-8")'
echo -n "UCS-2: "
./python -m timeit 'x="\u20ac"*50000' 'x.encode("utf-8")'
echo -n "UCS-4: "
./python -m timeit 'x="\U0010FFFF"*50000' 'x.encode("utf-8")'

Python 3.2:

ASCII: 10000 loops, best of 3: 31.5 usec per loop
UCS-1: 10000 loops, best of 3: 62.2 usec per loop
UCS-2: 10000 loops, best of 3: 91.3 usec per loop
UCS-4: 1000 loops, best of 3: 267 usec per loop

Python 3.3:

ASCII: 100000 loops, best of 3: 3.56 usec per loop
UCS-1: 10000 loops, best of 3: 98.2 usec per loop
UCS-2: 1000 loops, best of 3: 201 usec per loop
UCS-4: 10000 loops, best of 3: 168 usec per loop

Comparaison:

ASCII: Python 3.3 is 8.8x faster
UCS-1: Python 3.3 is 1.6x SLOWER
UCS-2: Python 3.3 is 2.2x SLOWER
UCS-4: Python 3.3 is 1.6x faster

iobench uses more realistic data.

> Standard iobench.py doesn't support testing for separate ASCII,
> UCS-1 and UCS-2 data, so you must have used some other tool.

According to Antoine, iobench is slower because of the UTF-8 encoder.

> hardware description

i7-2600 CPU @ 3.40GHz (8 cores) with 12 GB of RAM.

> I doubt that the _READ() macro really is the bottleneck

It is the only difference between Python 3.2 and 3.3. Or did I miss something? The body of the loop is very small, so each instruction is important.
msg149703 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-17 20:24
Oh, Antoine told me that I missed the -s command line argument to timeit:

$ cat bench.sh 
echo -n "ASCII: "
./python -m timeit -s 'x="A"*50000' 'x.encode("utf-8")'
echo -n "UCS-1: "
./python -m timeit -s 'x="\xe9"*50000' 'x.encode("utf-8")'
echo -n "UCS-2: "
./python -m timeit -s 'x="\u20ac"*50000' 'x.encode("utf-8")'
echo -n "UCS-4: "
./python -m timeit -s 'x="\U0010FFFF"*50000' 'x.encode("utf-8")'

Python 3.2:

ASCII: 10000 loops, best of 3: 28.2 usec per loop
UCS-1: 10000 loops, best of 3: 59.1 usec per loop
UCS-2: 10000 loops, best of 3: 88.8 usec per loop
UCS-4: 1000 loops, best of 3: 254 usec per loop

Python 3.3:

ASCII: 1000000 loops, best of 3: 2.01 usec per loop
UCS-1: 10000 loops, best of 3: 95.8 usec per loop
UCS-2: 1000 loops, best of 3: 201 usec per loop
UCS-4: 10000 loops, best of 3: 151 usec per loop

The results look to be similar.
msg149705 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-17 21:19
Python 3.2 (narrow):

ASCII: 10000 loops, best of 3: 28.2 usec per loop
UCS-1: 10000 loops, best of 3: 59.1 usec per loop
UCS-2: 10000 loops, best of 3: 88.8 usec per loop
UCS-4: 1000 loops, best of 3: 254 usec per loop

Python 3.2 (wide):

ASCII: 10000 loops, best of 3: 28.5 usec per loop
UCS-1: 10000 loops, best of 3: 60.8 usec per loop
UCS-2: 10000 loops, best of 3: 114 usec per loop
UCS-4: 10000 loops, best of 3: 129 usec per loop

Python 3.3 (specialized UTF-8 encoder):

ASCII: 100000 loops, best of 3: 2 usec per loop
UCS-1: 10000 loops, best of 3: 45.4 usec per loop
UCS-2: 10000 loops, best of 3: 96.4 usec per loop
UCS-4: 10000 loops, best of 3: 140 usec per loop

Attached patch adds UTF-8 encoder for UCS1, UCS2 and UCS4.
msg149706 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-17 21:25
> 8x faster (!) for a string of 50.000 ASCII characters

Oooh, it's just faster because encoding ASCII to UTF-8 is now O(1). The ASCII data is shared with the UTF-8 data thanks to the PEP 393!
msg149747 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-18 12:20
Updated patch to fix also the size of the small buffer on the stack, as suggested by Antoine.
msg149748 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-18 12:56
utf8_encoder_prescan.patch: precompute the size of the output to avoid a PyBytes_Resize() at exit. It is much slower:

ASCII: 100000 loops, best of 3: 2.06 usec per loop
UCS-1: 10000 loops, best of 3: 123 usec per loop
UCS-2: 10000 loops, best of 3: 171 usec per loop
UCS-4: 1000 loops, best of 3: 254 usec per loop
msg149750 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-18 13:06
Patch version 3 to fix compiler warnings (avoid variables used for the error handler, unneeded for UCS-1).
msg149752 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-12-18 13:20
New changeset fbd797fc3809 by Victor Stinner in branch 'default':
Issue #13624: Write a specialized UTF-8 encoder to allow more optimization
http://hg.python.org/cpython/rev/fbd797fc3809
msg149799 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-12-18 19:31
> Oooh, it's just faster because encoding ASCII to UTF-8 is now O(1)

It's actually still O(n): the UTF-8 data still need to be copied into a
bytes object.
msg149800 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-18 19:44
> It's actually still O(n): the UTF-8 data still need to be copied
> into a bytes object.

Hum, correct, but a memory copy is much faster than having to decode UTF-8.
History
Date User Action Args
2022-04-11 14:57:24adminsetgithub: 57833
2011-12-18 19:44:54vstinnersetmessages: + msg149800
2011-12-18 19:31:02loewissetmessages: + msg149799
2011-12-18 13:25:34vstinnersetstatus: open -> closed
resolution: fixed
2011-12-18 13:20:53python-devsetnosy: + python-dev
messages: + msg149752
2011-12-18 13:06:19vstinnersetfiles: + utf8_encoder-3.patch

messages: + msg149750
2011-12-18 12:57:20vstinnersetfiles: - utf8_encoder.patch
2011-12-18 12:56:14vstinnersetfiles: + utf8_encoder_prescan.patch

messages: + msg149748
2011-12-18 12:20:07vstinnersetfiles: + utf8_encoder-2.patch

messages: + msg149747
2011-12-17 21:25:33vstinnersetmessages: + msg149706
2011-12-17 21:19:15vstinnersetfiles: + utf8_encoder.patch
keywords: + patch
messages: + msg149705
2011-12-17 20:51:46floxsetnosy: + flox
2011-12-17 20:24:43vstinnersetmessages: + msg149703
2011-12-17 20:13:53vstinnersetmessages: + msg149702
2011-12-17 19:50:11loewissetnosy: + loewis
messages: + msg149699
2011-12-17 18:58:45jceasetnosy: + jcea
2011-12-17 18:49:11vstinnercreate