Issue 16147: Rewrite PyUnicode_FromFormatV() to use the _PyUnicodeWriter API

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/60351

classification

Title:	Rewrite PyUnicode_FromFormatV() to use the _PyUnicodeWriter API
Type:	performance	Stage:
Components:		Versions:	Python 3.4

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	eric.smith, python-dev, vstinner
Priority:	normal	Keywords:	patch

Created on 2012-10-05 20:44 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
unicode_fromformat.patch	vstinner, 2012-10-05 20:44		review

Messages (4)
msg172138 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-10-05 20:44
Attached patch rewrites PyUnicode_FromFormatV(): * simplify the code: replace 4 steps with one unique step. PyUnicode_Format() has the same design. It avoids to store intermediate results which require to allocate an array of pointers in the heap. * use the _PyUnicodeWriter API for speed (and its convinient API): overallocate the buffer to reduce the number of "realloc()" * Implement "width" and "precision" in Python, don't rely on sprintf(). It avoids to need of a temporary buffer allocated on the heap: only use a small buffer allocated in the stack. * Detect integer overflow when parsing width and precision, as done in PyUnicode_Format() * Add _PyUnicodeWriter_WriteCstr() function * Split PyUnicode_FromFormatV() into smaller functions: add unicode_fromformat_arg(). It requires to copy vargs using Py_VA_COPY: without Py_VA_COPY, the function does crash. I don't understand why. * Inline parse_format_flags(): the format of an argument is now only parsed once, it's no more needed to have a subfunction. * Optimize PyUnicode_FromFormatV() for characters between two arguments: search the next "%" and copy the substring in one chunk, instead of copying character per character. * Replace "prec too big" with "precision too big" in error messages _tescapi.test_string_from_format() is 20% faster with the patch according to timeit. I don't know how to write better benchmarks because PyUnicode_FromV() is not exposed in Python. I wrote a benchmark using ctypes to call the function, but it looks like the ctypes overhead is too high. I wrote the patch to simplify the code, but it may be faster thanks to the _PyUnicodeWriter API and some optimizations implemented in the patch.
msg172149 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-10-05 21:22
"Split PyUnicode_FromFormatV() into smaller functions: add unicode_fromformat_arg(). It requires to copy vargs using Py_VA_COPY: without Py_VA_COPY, the function does crash. I don't understand why." Ok, here is the answer. http://stackoverflow.com/questions/8047362/is-gcc-mishandling-a-pointer-to-a-va-list-passed-to-a-function In short: va_list can be an array (of 1 element) on some platforms (ex: AMD64).
msg172247 - (view)	Author: Roundup Robot (python-dev)	Date: 2012-10-06 21:18
New changeset b4bee17625e1 by Victor Stinner in branch 'default': Issue #16147: Rewrite PyUnicode_FromFormatV() to use _PyUnicodeWriter API http://hg.python.org/cpython/rev/b4bee17625e1 New changeset d1369daeb9ec by Victor Stinner in branch 'default': Issue #16147: PyUnicode_FromFormatV() now detects integer overflow when parsing http://hg.python.org/cpython/rev/d1369daeb9ec New changeset 5e319fdab563 by Victor Stinner in branch 'default': Issue #16147: PyUnicode_FromFormatV() now raises an error if the argument of http://hg.python.org/cpython/rev/5e319fdab563
msg172252 - (view)	Author: Roundup Robot (python-dev)	Date: 2012-10-06 21:59
New changeset e16ec3b468d1 by Victor Stinner in branch 'default': Issue #16147: PyUnicode_FromFormatV() doesn't need anymore to allocate a buffer http://hg.python.org/cpython/rev/e16ec3b468d1

History
Date	User	Action	Args
2022-04-11 14:57:36	admin	set	github: 60351
2012-10-06 21:59:36	python-dev	set	messages: + msg172252
2012-10-06 21:24:02	vstinner	set	status: open -> closed resolution: fixed
2012-10-06 21:18:31	python-dev	set	nosy: + python-dev messages: + msg172247
2012-10-06 20:26:06	pitrou	set	nosy: + eric.smith
2012-10-05 21:22:50	vstinner	set	messages: + msg172149
2012-10-05 20:44:19	vstinner	create