Message 149695 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	ezio.melotti, pitrou, vstinner
Date	2011-12-17.18:49:11
SpamBayes Score	3.4049632e-09
Marked as misclassified	No
Message-id	<1324147751.99.0.957308589374.issue13624@psf.upfronthosting.co.za>
In-reply-to

Content
iobench benchmarking tool showed that the UTF-8 encoder is slower in Python 3.3 than Python 3.2. The performance depends on the characters of the input string: * 8x faster (!) for a string of 50.000 ASCII characters * 1.5x slower for a string of 50.000 UCS-1 characters * 2.5x slower for a string of 50.000 UCS-2 characters The bottleneck looks to be the the PyUnicode_READ() macro. * Python 3.2: s[i++] * Python 3.3: PyUnicode_READ(kind, data, i++) Because encoding string to UTF-8 is a very common operation, performances do matter. Antoine suggests to have different versions of the function for each Unicode kind (1, 2, 4).

iobench benchmarking tool showed that the UTF-8 encoder is slower in Python 3.3 than Python 3.2. The performance depends on the characters of the input string:

 * 8x faster (!) for a string of 50.000 ASCII characters
 * 1.5x slower for a string of 50.000 UCS-1 characters
 * 2.5x slower for a string of 50.000 UCS-2 characters

The bottleneck looks to be the the PyUnicode_READ() macro.

 * Python 3.2: s[i++]
 * Python 3.3: PyUnicode_READ(kind, data, i++)

Because encoding string to UTF-8 is a very common operation, performances do matter. Antoine suggests to have different versions of the function for each Unicode kind (1, 2, 4).

History
Date	User	Action	Args
2011-12-17 18:49:12	vstinner	set	recipients: + vstinner, pitrou, ezio.melotti
2011-12-17 18:49:11	vstinner	set	messageid: <1324147751.99.0.957308589374.issue13624@psf.upfronthosting.co.za>
2011-12-17 18:49:11	vstinner	link	issue13624 messages
2011-12-17 18:49:11	vstinner	create