Rewrite PyUnicode_FromFormatV() to use the _PyUnicodeWriter API #60351

vstinner · 2012-10-05T20:44:20Z

BPO	16147
Nosy	@vstinner, @ericvsmith
Files	unicode_fromformat.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-10-06.21:24:02.873>
created_at = <Date 2012-10-05.20:44:19.806>
labels = ['performance']
title = 'Rewrite PyUnicode_FromFormatV() to use the _PyUnicodeWriter API'
updated_at = <Date 2012-10-06.21:59:36.045>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2012-10-06.21:59:36.045>
actor = 'python-dev'
assignee = 'none'
closed = True
closed_date = <Date 2012-10-06.21:24:02.873>
closer = 'vstinner'
components = []
creation = <Date 2012-10-05.20:44:19.806>
creator = 'vstinner'
dependencies = []
files = ['27440']
hgrepos = []
issue_num = 16147
keywords = ['patch']
message_count = 4.0
messages = ['172138', '172149', '172247', '172252']
nosy_count = 3.0
nosy_names = ['vstinner', 'eric.smith', 'python-dev']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue16147'
versions = ['Python 3.4']

vstinner · 2012-10-05T20:44:16Z

Attached patch rewrites PyUnicode_FromFormatV():

simplify the code: replace 4 steps with one unique step. PyUnicode_Format() has the same design. It avoids to store intermediate results which require to allocate an array of pointers in the heap.
use the _PyUnicodeWriter API for speed (and its convinient API): overallocate the buffer to reduce the number of "realloc()"
Implement "width" and "precision" in Python, don't rely on sprintf(). It avoids to need of a temporary buffer allocated on the heap: only use a small buffer allocated in the stack.
Detect integer overflow when parsing width and precision, as done in PyUnicode_Format()
Add _PyUnicodeWriter_WriteCstr() function
Split PyUnicode_FromFormatV() into smaller functions: add unicode_fromformat_arg(). It requires to copy vargs using Py_VA_COPY: without Py_VA_COPY, the function does crash. I don't understand why.
Inline parse_format_flags(): the format of an argument is now only parsed once, it's no more needed to have a subfunction.
Optimize PyUnicode_FromFormatV() for characters between two arguments: search the next "%" and copy the substring in one chunk, instead of copying character per character.
Replace "prec too big" with "precision too big" in error messages

_tescapi.test_string_from_format() is 20% faster with the patch according to timeit. I don't know how to write better benchmarks because PyUnicode_FromV() is not exposed in Python. I wrote a benchmark using ctypes to call the function, but it looks like the ctypes overhead is too high.

I wrote the patch to simplify the code, but it may be faster thanks to the _PyUnicodeWriter API and some optimizations implemented in the patch.

vstinner · 2012-10-05T21:22:50Z

"Split PyUnicode_FromFormatV() into smaller functions: add unicode_fromformat_arg(). It requires to copy vargs using Py_VA_COPY: without Py_VA_COPY, the function does crash. I don't understand why."

Ok, here is the answer.
http://stackoverflow.com/questions/8047362/is-gcc-mishandling-a-pointer-to-a-va-list-passed-to-a-function

In short: va_list can be an array (of 1 element) on some platforms (ex: AMD64).

python-dev · 2012-10-06T21:18:31Z

New changeset b4bee17625e1 by Victor Stinner in branch 'default':
Issue bpo-16147: Rewrite PyUnicode_FromFormatV() to use _PyUnicodeWriter API
http://hg.python.org/cpython/rev/b4bee17625e1

New changeset d1369daeb9ec by Victor Stinner in branch 'default':
Issue bpo-16147: PyUnicode_FromFormatV() now detects integer overflow when parsing
http://hg.python.org/cpython/rev/d1369daeb9ec

New changeset 5e319fdab563 by Victor Stinner in branch 'default':
Issue bpo-16147: PyUnicode_FromFormatV() now raises an error if the argument of
http://hg.python.org/cpython/rev/5e319fdab563

python-dev · 2012-10-06T21:59:36Z

New changeset e16ec3b468d1 by Victor Stinner in branch 'default':
Issue bpo-16147: PyUnicode_FromFormatV() doesn't need anymore to allocate a buffer
http://hg.python.org/cpython/rev/e16ec3b468d1

vstinner added the performance Performance or resource usage label Oct 5, 2012

vstinner closed this as completed Oct 6, 2012

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite PyUnicode_FromFormatV() to use the _PyUnicodeWriter API #60351

Rewrite PyUnicode_FromFormatV() to use the _PyUnicodeWriter API #60351

vstinner commented Oct 5, 2012

vstinner commented Oct 5, 2012

vstinner commented Oct 5, 2012

python-dev mannequin commented Oct 6, 2012

python-dev mannequin commented Oct 6, 2012

Rewrite PyUnicode_FromFormatV() to use the _PyUnicodeWriter API #60351

Rewrite PyUnicode_FromFormatV() to use the _PyUnicodeWriter API #60351

Comments

vstinner commented Oct 5, 2012

vstinner commented Oct 5, 2012

vstinner commented Oct 5, 2012

python-dev mannequin commented Oct 6, 2012

python-dev mannequin commented Oct 6, 2012