Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite PyUnicode_FromFormatV() to use the _PyUnicodeWriter API #60351

Closed
vstinner opened this issue Oct 5, 2012 · 4 comments
Closed

Rewrite PyUnicode_FromFormatV() to use the _PyUnicodeWriter API #60351

vstinner opened this issue Oct 5, 2012 · 4 comments
Labels
performance Performance or resource usage

Comments

@vstinner
Copy link
Member

vstinner commented Oct 5, 2012

BPO 16147
Nosy @vstinner, @ericvsmith
Files
  • unicode_fromformat.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-10-06.21:24:02.873>
    created_at = <Date 2012-10-05.20:44:19.806>
    labels = ['performance']
    title = 'Rewrite PyUnicode_FromFormatV() to use the _PyUnicodeWriter API'
    updated_at = <Date 2012-10-06.21:59:36.045>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2012-10-06.21:59:36.045>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-10-06.21:24:02.873>
    closer = 'vstinner'
    components = []
    creation = <Date 2012-10-05.20:44:19.806>
    creator = 'vstinner'
    dependencies = []
    files = ['27440']
    hgrepos = []
    issue_num = 16147
    keywords = ['patch']
    message_count = 4.0
    messages = ['172138', '172149', '172247', '172252']
    nosy_count = 3.0
    nosy_names = ['vstinner', 'eric.smith', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue16147'
    versions = ['Python 3.4']

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 5, 2012

    Attached patch rewrites PyUnicode_FromFormatV():

    • simplify the code: replace 4 steps with one unique step. PyUnicode_Format() has the same design. It avoids to store intermediate results which require to allocate an array of pointers in the heap.
    • use the _PyUnicodeWriter API for speed (and its convinient API): overallocate the buffer to reduce the number of "realloc()"
    • Implement "width" and "precision" in Python, don't rely on sprintf(). It avoids to need of a temporary buffer allocated on the heap: only use a small buffer allocated in the stack.
    • Detect integer overflow when parsing width and precision, as done in PyUnicode_Format()
    • Add _PyUnicodeWriter_WriteCstr() function
    • Split PyUnicode_FromFormatV() into smaller functions: add unicode_fromformat_arg(). It requires to copy vargs using Py_VA_COPY: without Py_VA_COPY, the function does crash. I don't understand why.
    • Inline parse_format_flags(): the format of an argument is now only parsed once, it's no more needed to have a subfunction.
    • Optimize PyUnicode_FromFormatV() for characters between two arguments: search the next "%" and copy the substring in one chunk, instead of copying character per character.
    • Replace "prec too big" with "precision too big" in error messages

    _tescapi.test_string_from_format() is 20% faster with the patch according to timeit. I don't know how to write better benchmarks because PyUnicode_FromV() is not exposed in Python. I wrote a benchmark using ctypes to call the function, but it looks like the ctypes overhead is too high.

    I wrote the patch to simplify the code, but it may be faster thanks to the _PyUnicodeWriter API and some optimizations implemented in the patch.

    @vstinner vstinner added the performance Performance or resource usage label Oct 5, 2012
    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 5, 2012

    "Split PyUnicode_FromFormatV() into smaller functions: add unicode_fromformat_arg(). It requires to copy vargs using Py_VA_COPY: without Py_VA_COPY, the function does crash. I don't understand why."

    Ok, here is the answer.
    http://stackoverflow.com/questions/8047362/is-gcc-mishandling-a-pointer-to-a-va-list-passed-to-a-function

    In short: va_list can be an array (of 1 element) on some platforms (ex: AMD64).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 6, 2012

    New changeset b4bee17625e1 by Victor Stinner in branch 'default':
    Issue bpo-16147: Rewrite PyUnicode_FromFormatV() to use _PyUnicodeWriter API
    http://hg.python.org/cpython/rev/b4bee17625e1

    New changeset d1369daeb9ec by Victor Stinner in branch 'default':
    Issue bpo-16147: PyUnicode_FromFormatV() now detects integer overflow when parsing
    http://hg.python.org/cpython/rev/d1369daeb9ec

    New changeset 5e319fdab563 by Victor Stinner in branch 'default':
    Issue bpo-16147: PyUnicode_FromFormatV() now raises an error if the argument of
    http://hg.python.org/cpython/rev/5e319fdab563

    @vstinner vstinner closed this as completed Oct 6, 2012
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 6, 2012

    New changeset e16ec3b468d1 by Victor Stinner in branch 'default':
    Issue bpo-16147: PyUnicode_FromFormatV() doesn't need anymore to allocate a buffer
    http://hg.python.org/cpython/rev/e16ec3b468d1

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant