This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: PyUnicodeWriter: change the overallocation factor for Windows
Type: performance Stage:
Components: Windows Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: pitrou, python-dev, serhiy.storchaka, tim.peters, vstinner
Priority: normal Keywords: patch

Created on 2013-11-14 09:37 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
writer_overallocate_factor.patch vstinner, 2013-11-14 09:37 review
Messages (8)
msg202823 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-14 09:37
PyUnicodeWriter currently overallocates the internal buffer by 25%. On Windows, PyUnicodeWriter is slower than PyAccu API. With an overallocation factor of 50%, PyUnicodeWriter is fastter.

See this message for the benchmark:
http://bugs.python.org/issue19513#msg202312

We might also change the factor on all platform, performances are almost the same with a factor of 25% or 50% on Linux.
msg202835 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-11-14 12:12
Your patch implies that the two only supported OSes are Linux and Windows :-) To be honest I think we should have one single overallocation factor for all OSes. If 50% is ok on Linux too, then let it be 50%.

(did you run stringbench to see if it made a difference?)
msg202849 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-14 15:26
Many collections in other programming languages use overallocating rate 100%. Perhaps overallocating rate 12.5% for Python lists is good compromise between speed and memory usage (as far as Python lists have no explicit resize method), but for short-living objects we can use larger overallocating rate.

In theory for overallocating rate r long list needs in average O(1/log(1+r)) reallocations and O((r+1)/r) copyings per element. I.e. 50% overallocating rate is 3-3.4 times more efficient than current 12.5% overallocating rate and 100% overallocating rate is 1.5-1.7 times more efficient than 50% overallocating rate (in terms of numbers of reallocations and copyings).

I'm interesting what will happened when increase overallocating rate (50% or 100%) for PyAccu.
msg202851 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-14 15:31
I chose 25% on Linux after some micro-benchmarks on str%args and str.format(args). If the buffer is too large, the final resize (because PyUnicodeObject must have the exact size) is slow. I suppose that realloc() can avoid copying data if the new is is very close, but has to allocate a new memory block and copy data if the new size is higher than a threshold. It's how _PyObject_Realloc() for example.
msg202855 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-11-14 16:04
Did you find a difference for small strings vs. large strings?
msg202861 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-14 16:59
> Did you find a difference for small strings vs. large strings?

When I replaced PyAccu with PyUnicodeWriter for str%args and str.format(args), I ran a lot of benchmarks with short, medium and large strings. See for example:
http://bugs.python.org/file25687/REPORT_64BIT_2.7_3.2_writer

See issues #14716 and #14744 for old benchmark results. If I remember correctly, this is the script used to run the benchmark:

https://bitbucket.org/haypo/misc/src/7c2deb7a37353b41a45564ce6a98e07bbe0c691b/python/bench_str.py

The script should be run using:

https://bitbucket.org/haypo/misc/src/7c2deb7a37353b41a45564ce6a98e07bbe0c691b/python/benchmark.py?at=default

I was concerned by performances on short strings because most calls to str%args are short strings.
msg203265 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-18 09:28
> Your patch implies that the two only supported OSes are Linux and Windows :-)

It more means that Windows memory allocator is different to the one used on all other operating systems.

Well, if you are not convinced, we can keep the overallocation factor of 25%: performances are not so bad. The difference between 25% and 50% is low.

--

Benchmark on patched repr(list) (to use PyUnicodeWriter, see issue #19513) using different overallocation factors on Linux. The best factor is 25% (the current factor).


Platform: Linux-3.9.4-200.fc18.x86_64-x86_64-with-fedora-18-Spherical_Cow
Bits: int=32, long=64, long long=64, size_t=64, void*=64
Timer: time.perf_counter
SCM: hg revision=00348c0518f8+ tag=tip branch=default date="2013-11-18 10:04 +0100"
Python unicode implementation: PEP 393
CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
Python version: 3.4.0a4+ (default:00348c0518f8+, Nov 18 2013, 10:11:42) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)]
Timer precision: 40 ns

-----------------------------+-------------+----------------+----------------+--------------
Tests                        |   writer-25 |    writer-12.5 |      writer-50 |    writer-100
-----------------------------+-------------+----------------+----------------+--------------
list("a")                    |  247 ns (*) |  272 ns (+10%) |   265 ns (+7%) | 283 ns (+14%)
list("abc")                  |  435 ns (*) |   407 ns (-7%) |         430 ns |        427 ns
["a"]*(100)                  | 8.26 us (*) |         8.4 us |  8.76 us (+6%) |       7.93 us
["abc"]*(100)                | 7.97 us (*) | 8.75 us (+10%) | 9.19 us (+15%) |       8.37 us
["a" * 100]*(100)            | 35.7 us (*) |        36.8 us |        36.6 us |       35.6 us
["a"]*(10**6)                |   73 ms (*) |  77.4 ms (+6%) | 84.9 ms (+16%) | 78.3 ms (+7%)
["abc"]*(10**6)              | 76.6 ms (*) |  81.1 ms (+6%) | 90.3 ms (+18%) | 82.4 ms (+8%)
["a" * 100]*(10**5)          | 35.1 ms (*) |        35.3 ms |        35.8 ms | 37.6 ms (+7%)
list(range(10**6))           | 93.4 ms (*) |  102 ms (+10%) |  103 ms (+10%) | 105 ms (+12%)
list(map(str, range(10**6))) |   97 ms (*) |        94.1 ms |        99.7 ms | 91.2 ms (-6%)
-----------------------------+-------------+----------------+----------------+--------------
Total                        |  375 ms (*) |         390 ms |  414 ms (+10%) |  394 ms (+5%)
-----------------------------+-------------+----------------+----------------+--------------
msg203320 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-11-18 20:19
New changeset 093b9838a41c by Victor Stinner in branch 'default':
Issue #19581: Change the overallocation factor of _PyUnicodeWriter on Windows
http://hg.python.org/cpython/rev/093b9838a41c
History
Date User Action Args
2022-04-11 14:57:53adminsetgithub: 63780
2013-11-18 20:30:50vstinnersetstatus: open -> closed
resolution: fixed
2013-11-18 20:19:07python-devsetnosy: + python-dev
messages: + msg203320
2013-11-18 09:28:13vstinnersetmessages: + msg203265
2013-11-14 16:59:50vstinnersetmessages: + msg202861
2013-11-14 16:04:13pitrousetnosy: + tim.peters
messages: + msg202855
2013-11-14 15:31:41vstinnersetmessages: + msg202851
2013-11-14 15:26:38serhiy.storchakasetmessages: + msg202849
2013-11-14 12:12:15pitrousetmessages: + msg202835
2013-11-14 09:37:25vstinnercreate