Message 149702 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	ezio.melotti, jcea, loewis, pitrou, vstinner
Date	2011-12-17.20:13:53
SpamBayes Score	0.00041952764
Marked as misclassified	No
Message-id	<1324152834.25.0.0676820511742.issue13624@psf.upfronthosting.co.za>
In-reply-to

Content
> Can you please provide your exact testing procedure? Here you have. $ cat bench.sh echo -n "ASCII: " ./python -m timeit 'x="A"50000' 'x.encode("utf-8")' echo -n "UCS-1: " ./python -m timeit 'x="\xe9"50000' 'x.encode("utf-8")' echo -n "UCS-2: " ./python -m timeit 'x="\u20ac"50000' 'x.encode("utf-8")' echo -n "UCS-4: " ./python -m timeit 'x="\U0010FFFF"50000' 'x.encode("utf-8")' Python 3.2: ASCII: 10000 loops, best of 3: 31.5 usec per loop UCS-1: 10000 loops, best of 3: 62.2 usec per loop UCS-2: 10000 loops, best of 3: 91.3 usec per loop UCS-4: 1000 loops, best of 3: 267 usec per loop Python 3.3: ASCII: 100000 loops, best of 3: 3.56 usec per loop UCS-1: 10000 loops, best of 3: 98.2 usec per loop UCS-2: 1000 loops, best of 3: 201 usec per loop UCS-4: 10000 loops, best of 3: 168 usec per loop Comparaison: ASCII: Python 3.3 is 8.8x faster UCS-1: Python 3.3 is 1.6x SLOWER UCS-2: Python 3.3 is 2.2x SLOWER UCS-4: Python 3.3 is 1.6x faster iobench uses more realistic data. > Standard iobench.py doesn't support testing for separate ASCII, > UCS-1 and UCS-2 data, so you must have used some other tool. According to Antoine, iobench is slower because of the UTF-8 encoder. > hardware description i7-2600 CPU @ 3.40GHz (8 cores) with 12 GB of RAM. > I doubt that the _READ() macro really is the bottleneck It is the only difference between Python 3.2 and 3.3. Or did I miss something? The body of the loop is very small, so each instruction is important.

> Can you please provide your exact testing procedure?

Here you have.

$ cat bench.sh 
echo -n "ASCII: "
./python -m timeit 'x="A"*50000' 'x.encode("utf-8")'
echo -n "UCS-1: "
./python -m timeit 'x="\xe9"*50000' 'x.encode("utf-8")'
echo -n "UCS-2: "
./python -m timeit 'x="\u20ac"*50000' 'x.encode("utf-8")'
echo -n "UCS-4: "
./python -m timeit 'x="\U0010FFFF"*50000' 'x.encode("utf-8")'

Python 3.2:

ASCII: 10000 loops, best of 3: 31.5 usec per loop
UCS-1: 10000 loops, best of 3: 62.2 usec per loop
UCS-2: 10000 loops, best of 3: 91.3 usec per loop
UCS-4: 1000 loops, best of 3: 267 usec per loop

Python 3.3:

ASCII: 100000 loops, best of 3: 3.56 usec per loop
UCS-1: 10000 loops, best of 3: 98.2 usec per loop
UCS-2: 1000 loops, best of 3: 201 usec per loop
UCS-4: 10000 loops, best of 3: 168 usec per loop

Comparaison:

ASCII: Python 3.3 is 8.8x faster
UCS-1: Python 3.3 is 1.6x SLOWER
UCS-2: Python 3.3 is 2.2x SLOWER
UCS-4: Python 3.3 is 1.6x faster

iobench uses more realistic data.

> Standard iobench.py doesn't support testing for separate ASCII,
> UCS-1 and UCS-2 data, so you must have used some other tool.

According to Antoine, iobench is slower because of the UTF-8 encoder.

> hardware description

i7-2600 CPU @ 3.40GHz (8 cores) with 12 GB of RAM.

> I doubt that the _READ() macro really is the bottleneck

It is the only difference between Python 3.2 and 3.3. Or did I miss something? The body of the loop is very small, so each instruction is important.

History
Date	User	Action	Args
2011-12-17 20:13:54	vstinner	set	recipients: + vstinner, loewis, jcea, pitrou, ezio.melotti
2011-12-17 20:13:54	vstinner	set	messageid: <1324152834.25.0.0676820511742.issue13624@psf.upfronthosting.co.za>
2011-12-17 20:13:53	vstinner	link	issue13624 messages
2011-12-17 20:13:53	vstinner	create