classification
Title: Rewrite PyUnicode_FromFormatV() to use the _PyUnicodeWriter API
Type: performance Stage:
Components: Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, python-dev, vstinner
Priority: normal Keywords: patch

Created on 2012-10-05 20:44 by vstinner, last changed 2012-10-06 21:59 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
unicode_fromformat.patch vstinner, 2012-10-05 20:44 review
Messages (4)
msg172138 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-10-05 20:44
Attached patch rewrites PyUnicode_FromFormatV():
 * simplify the code: replace 4 steps with one unique step. PyUnicode_Format() has the same design. It avoids to store intermediate results which require to allocate an array of pointers in the heap.
 * use the _PyUnicodeWriter API for speed (and its convinient API): overallocate the buffer to reduce the number of "realloc()"
 * Implement "width" and "precision" in Python, don't rely on sprintf(). It avoids to need of a temporary buffer allocated on the heap: only use a small buffer allocated in the stack.
 * Detect integer overflow when parsing width and precision, as done in PyUnicode_Format()
 * Add _PyUnicodeWriter_WriteCstr() function
 * Split PyUnicode_FromFormatV() into smaller functions: add unicode_fromformat_arg(). It requires to copy vargs using Py_VA_COPY: without Py_VA_COPY, the function does crash. I don't understand why.
 * Inline parse_format_flags(): the format of an argument is now only parsed once, it's no more needed to have a subfunction.
 * Optimize PyUnicode_FromFormatV() for characters between two arguments: search the next "%" and copy the substring in one chunk, instead of copying character per character.
 * Replace "prec too big" with "precision too big" in error messages

_tescapi.test_string_from_format() is 20% faster with the patch according to timeit. I don't know how to write better benchmarks because PyUnicode_FromV() is not exposed in Python. I wrote a benchmark using ctypes to call the function, but it looks like the ctypes overhead is too high.

I wrote the patch to simplify the code, but it may be faster thanks to the _PyUnicodeWriter API and some optimizations implemented in the patch.
msg172149 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-10-05 21:22
"Split PyUnicode_FromFormatV() into smaller functions: add unicode_fromformat_arg(). It requires to copy vargs using Py_VA_COPY: without Py_VA_COPY, the function does crash. I don't understand why."

Ok, here is the answer.
http://stackoverflow.com/questions/8047362/is-gcc-mishandling-a-pointer-to-a-va-list-passed-to-a-function

In short: va_list can be an array (of 1 element) on some platforms (ex: AMD64).
msg172247 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-10-06 21:18
New changeset b4bee17625e1 by Victor Stinner in branch 'default':
Issue #16147: Rewrite PyUnicode_FromFormatV() to use _PyUnicodeWriter API
http://hg.python.org/cpython/rev/b4bee17625e1

New changeset d1369daeb9ec by Victor Stinner in branch 'default':
Issue #16147: PyUnicode_FromFormatV() now detects integer overflow when parsing
http://hg.python.org/cpython/rev/d1369daeb9ec

New changeset 5e319fdab563 by Victor Stinner in branch 'default':
Issue #16147: PyUnicode_FromFormatV() now raises an error if the argument of
http://hg.python.org/cpython/rev/5e319fdab563
msg172252 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-10-06 21:59
New changeset e16ec3b468d1 by Victor Stinner in branch 'default':
Issue #16147: PyUnicode_FromFormatV() doesn't need anymore to allocate a buffer
http://hg.python.org/cpython/rev/e16ec3b468d1
History
Date User Action Args
2012-10-06 21:59:36python-devsetmessages: + msg172252
2012-10-06 21:24:02vstinnersetstatus: open -> closed
resolution: fixed
2012-10-06 21:18:31python-devsetnosy: + python-dev
messages: + msg172247
2012-10-06 20:26:06pitrousetnosy: + eric.smith
2012-10-05 21:22:50vstinnersetmessages: + msg172149
2012-10-05 20:44:19vstinnercreate