Message 252325 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	ezio.melotti, serhiy.storchaka, vstinner
Date	2015-10-05.12:12:21
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1444047142.25.0.339445709578.issue25318@psf.upfronthosting.co.za>
In-reply-to

Content
A few months ago, I wrote a previous implementation of the _PyBytesWriter API which embedded the "current pointer" inside _PyBytesWriter API. The problem was that GCC produced less efficient code than expect for the hotspot of the encoder. In the new implementation (attached patch), the "current pointer" is unchanged: it's still a variable local to the encoder function. Instead, the current pointer became a parameter to all _PyBytesWriter functions. I expect to not touch performances of encoders for valid encoded strings (when the code calling error handlers is not used), which is important since we have very good performance here. _PyBytesWriter is not restricted to the code to allocate the buffer. -- bytes_writer.patch: + char stackbuf[256]; Oh, I forgot to mention this other small optimization. I also added a small buffer allocated on the C stack for the UCS1 encoder (ASCII, Latin1). It may optimize a little bit encoding when the output string is smaller than 256 bytes when the error handler is used. The optimization comes from the very efficient UTF-8 encoder.

A few months ago, I wrote a previous implementation of the _PyBytesWriter API which embedded the "current pointer" inside _PyBytesWriter API. The problem was that GCC produced less efficient code than expect for the hotspot of the encoder.

In the new implementation (attached patch), the "current pointer" is unchanged: it's still a variable local to the encoder function. Instead, the current pointer became a *parameter* to all _PyBytesWriter *functions*.

I expect to not touch performances of encoders for valid encoded strings (when the code calling error handlers is not used), which is important since we have very good performance here.

_PyBytesWriter is not restricted to the code to allocate the buffer.

--

bytes_writer.patch:

+    char stackbuf[256];

Oh, I forgot to mention this other small optimization. I also added a small buffer allocated on the C stack for the UCS1 encoder (ASCII, Latin1). It may optimize a little bit encoding when the output string is smaller than 256 bytes when the error handler is used.

The optimization comes from the very efficient UTF-8 encoder.

History
Date	User	Action	Args
2015-10-05 12:12:22	vstinner	set	recipients: + vstinner, ezio.melotti, serhiy.storchaka
2015-10-05 12:12:22	vstinner	set	messageid: <1444047142.25.0.339445709578.issue25318@psf.upfronthosting.co.za>
2015-10-05 12:12:22	vstinner	link	issue25318 messages
2015-10-05 12:12:22	vstinner	create