This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients loewis, pitrou, python-dev, serhiy.storchaka, vstinner
Date 2012-05-02.23:42:39
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1336002160.81.0.443844484376.issue14687@psf.upfronthosting.co.za>
In-reply-to
Content
pyunicode_format_writer.patch: a new completly different approach. It's an optimistic patch: start with a short ASCII buffer, and grows slowly the buffer, and convert to UCS2 and maybe to UCS4 if needed. The UTF-8 decoder is based on the same idea.

The patch adds a "unicode writer", the optimistic writer. It overallocates the buffer by 50% to limit the number of calls to PyUnicode_Resize(). It may be reused by other functions.

My dummy benchmark script:
------------
$ cat ~/bench.sh 
./python -m timeit \
    -s 'fmt="%s:"; arg="abc"' \
    'fmt % arg'
./python -m timeit \
    -s 'N=200; L=3; fmt="%s"*N; args=("a"*L,)*N' \
    'fmt % args'
./python -m timeit \
    -s 's="x=%s, y=%u, z=%x"; args=(123, 456, 789)' \
    's%args'
./python -m timeit \
    -s 's="The %(k1)s is %(k2)s the %(k3)s."; args={"k1":"x","k2":"y","k3":"z",}' \
    's%args'
------------

Results.

Python 3.2:

10000000 loops, best of 3: 0.0916 usec per loop
100000 loops, best of 3: 4.04 usec per loop
1000000 loops, best of 3: 0.492 usec per loop
1000000 loops, best of 3: 0.305 usec per loop

Python 3.3:

10000000 loops, best of 3: 0.169 usec per loop
100000 loops, best of 3: 8.02 usec per loop
1000000 loops, best of 3: 0.648 usec per loop
1000000 loops, best of 3: 0.658 usec per loop

Python 3.3 optimist (compared to 3.3):

10000000 loops, best of 3: 0.123 usec per loop (-27%)
100000 loops, best of 3: 5.73 usec per loop (-29%)
1000000 loops, best of 3: 0.466 usec per loop (-28%)
1000000 loops, best of 3: 0.454 usec per loop (-31%)

Overhead of the PEP 393 (Python 3.2 => 3.3) without -> with the patch:

 * 85% -> 35%
 * 99% -> 41%
 * 31% -> -5% (Python 3.3 is *faster* on this specific case! maybe thanks to f4837725c50f)
 * 115% -> 49%

--

"%(name)s" syntax is still *much* slower than Python 3.2, I don't understand why.

Parameters of the Unicode writer (overallocation factor and initial size) may be adjusted (later?) for better performances.
History
Date User Action Args
2012-05-02 23:42:40vstinnersetrecipients: + vstinner, loewis, pitrou, python-dev, serhiy.storchaka
2012-05-02 23:42:40vstinnersetmessageid: <1336002160.81.0.443844484376.issue14687@psf.upfronthosting.co.za>
2012-05-02 23:42:40vstinnerlinkissue14687 messages
2012-05-02 23:42:39vstinnercreate