This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients ezio.melotti, pitrou, vstinner
Date 2011-12-17.18:49:11
SpamBayes Score 3.4049632e-09
Marked as misclassified No
Message-id <1324147751.99.0.957308589374.issue13624@psf.upfronthosting.co.za>
In-reply-to
Content
iobench benchmarking tool showed that the UTF-8 encoder is slower in Python 3.3 than Python 3.2. The performance depends on the characters of the input string:

 * 8x faster (!) for a string of 50.000 ASCII characters
 * 1.5x slower for a string of 50.000 UCS-1 characters
 * 2.5x slower for a string of 50.000 UCS-2 characters

The bottleneck looks to be the the PyUnicode_READ() macro.

 * Python 3.2: s[i++]
 * Python 3.3: PyUnicode_READ(kind, data, i++)

Because encoding string to UTF-8 is a very common operation, performances do matter. Antoine suggests to have different versions of the function for each Unicode kind (1, 2, 4).
History
Date User Action Args
2011-12-17 18:49:12vstinnersetrecipients: + vstinner, pitrou, ezio.melotti
2011-12-17 18:49:11vstinnersetmessageid: <1324147751.99.0.957308589374.issue13624@psf.upfronthosting.co.za>
2011-12-17 18:49:11vstinnerlinkissue13624 messages
2011-12-17 18:49:11vstinnercreate