classification
Title: _PyDict_NewPresized() creates too small dict
Type: performance Stage: commit review
Components: Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: inada.naoki Nosy List: inada.naoki, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2016-11-18 11:03 by inada.naoki, last changed 2017-03-31 16:36 by dstufft. This issue is now closed.

Files
File name Uploaded Description Edit
PyDict_NewPresized-too-small.patch inada.naoki, 2016-11-18 11:03 review
28731-PyDict_NewPresized-too-small-2.patch inada.naoki, 2016-11-21 11:33 review
28731-PyDict_NewPresized-too-small-3.patch inada.naoki, 2016-11-21 12:56 review
Pull Requests
URL Status Linked Edit
PR 552 closed dstufft, 2017-03-31 16:36
Messages (22)
msg281092 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-11-18 11:03
_PyDict_NewPresized(6) creates dict which keysize is 8.
But it's capacity is 5 (8 * 2 // 3 = 5).

This internal API is used by BUILD_MAP and BUILD_CONST_KEYMAP.
msg281093 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-11-18 11:08
This patch includes fix for ESTIMATE_SIZE macro. (see below)
Same fix is included in patch for issue28147.

>>> def estimate_size(n):
...     return n * 3 // 2  # Current ESTIMATE_SIZE
...
>>> def usable(n):
...     return n * 2 // 3
...
>>> def keysize(minsize):
...     size = 8
...     while size < minsize:  # Current implementation uses <=
...         size *= 2
...     return size
...
>>> def check():
...     for i in range(1000):
...         estimate = estimate_size(i)
...         size = keysize(estimate)
...         cap = usable(size)
...         if cap < i:
...             print(i, estimate, size, cap)
...
>>> check()
11 16 16 10
43 64 64 42
171 256 256 170
683 1024 1024 682
>>> # 
>>> estimate_size = lambda n: (n * 3 +1) // 2  # Fixed version
>>> check()
>>>
msg281098 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-11-18 11:26
current:
$ ~/work/python36/bin/python3 -m perf timeit -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
.....................
Median +- std dev: 925 ns +- 49 ns

patched:
$ ~/work/python36/bin/python3 -m perf timeit -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
.....................
Median +- std dev: 726 ns +- 30 ns
msg281105 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-18 12:18
FYI if you rename Python binaries, you can use:

./python-patched perf timeit --compare-to=./python-orig
msg281106 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-18 12:18
> Median +- std dev: 925 ns +- 49 ns
> Median +- std dev: 726 ns +- 30 ns

Nice speedup. Can you please compare Python 3.5? Better if you compile it
yourself with the same options.

I'm curious if 925 ns is slower than py3.5 or if 726 ns is faster than
py3.5.
msg281108 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-18 12:31
The condition in the loop in _PyDict_NewPresized() contains the test newsize > 0. This is a check for integer overflow. But it doesn't make much sense. First, the overflow is undefined behavior, and it is too late to detect it when it already is happen. Second, after detecting the negative value just is passed to new_keys_object() which either is crashed in debug build or makes other integer overflow and creates invalid object.

I would add a runtime check that minused is less than PY_SSIZE_MAX/3 (or more strong PY_SSIZE_MAX/3*2/sizeof(Pobject *)). This would guarantee that integer overflow is not possible. The test "newsize > 0" could be removed.

There is similar code in dictresize().
msg281109 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-11-18 12:49
inada-n@test1:~/work/py36$ ~/local/py36/bin/patched -m perf timeit --compare-to ~/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
master: ..................... 910 ns +- 49 ns
patched: ..................... 713 ns +- 26 ns

Median +- std dev: [master] 910 ns +- 49 ns -> [patched] 713 ns +- 26 ns: 1.28x faster

inada-n@test1:~/work/py36$ ~/local/py36/bin/patched -m perf timeit --compare-to ~/local/py35/bin/python3.5 -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
python3.5: ..................... 1.17 us +- 0.06 us
patched: ..................... 720 ns +- 24 ns

Median +- std dev: [python3.5] 1.17 us +- 0.06 us -> [patched] 720 ns +- 24 ns: 1.62x faster
msg281113 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-11-18 12:57
Python 3.5 has similar issue:
$ ~/local/py35/bin/patched -m perf timeit --compare-to ~/local/py35/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
master: ..................... 1.15 us +- 0.09 us
patched: ..................... 885 ns +- 37 ns

Median +- std dev: [master] 1.15 us +- 0.09 us -> [patched] 885 ns +- 37 ns: 1.30x faster


patch:
diff -r 20f62e4a9c2f Objects/dictobject.c
--- a/Objects/dictobject.c      Wed Nov 16 16:32:22 2016 -0800
+++ b/Objects/dictobject.c      Fri Nov 18 12:56:38 2016 +0000
@@ -1015,8 +1015,9 @@ PyObject *
 {
     Py_ssize_t newsize;
     PyDictKeysObject *new_keys;
+    minused = (minused * 3 + 1) / 2;
     for (newsize = PyDict_MINSIZE_COMBINED;
-         newsize <= minused && newsize > 0;
+         newsize < minused && newsize > 0;
          newsize <<= 1)
         ;
     new_keys = new_keys_object(newsize);
msg281117 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-18 13:33
Ok, it's an optimization, not a performance regression of Python 3.6. In this case, I suggest to first push the change to Python 3.7, and after 3.6.0 release, backport to 3.6 (if you want to), and maybe even 3.5 if it makes sense (and if you want).

I didn't review the patch.
msg281118 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-11-18 13:49
On Fri, Nov 18, 2016 at 9:31 PM, Serhiy Storchaka
<report@bugs.python.org> wrote:
>
> Serhiy Storchaka added the comment:
>
> The condition in the loop in _PyDict_NewPresized() contains the test newsize > 0. This is a check for integer overflow. But it doesn't make much sense. First, the overflow is undefined behavior, and it is too late to detect it when it already is happen. Second, after detecting the negative value just is passed to new_keys_object() which either is crashed in debug build or makes other integer overflow and creates invalid object.
>
> I would add a runtime check that minused is less than PY_SSIZE_MAX/3 (or more strong PY_SSIZE_MAX/3*2/sizeof(Pobject *)). This would guarantee that integer overflow is not possible. The test "newsize > 0" could be removed.
>
> There is similar code in dictresize().
>

Nice idea.  I'll update patch in issue28147.

In case of _PyDict_NewPresized(minused), it would be called from 3rd
party libraries, and there are no strong
guarantee about PyDict_SetItem() won't resize until minused items.
So how about more small, maximum presize?

#define MAX_INITSIZE (128 * 1024)

if (minused > USABLE_FRACTION(MAX_INITSIZE)) {
    newsize = MAX_INITSIZE;
}
else {
    newsize = PyDict_MINSIZE;
    whilie (newsize < minused)
        newsize <<= 1;   // Can't we assume *= 2 is optimized?
};
msg281120 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-18 13:52
Even if the use case is limited (creation of small dictionaries), it's a nice enhancement, good job Naoki! It's nice to see someone looking at this very important Python type!

1.28x faster or 1.62x faster is significant for a micro-optimization! (My personal threshold is at least 10% faster for a micro-optimization.)
msg281121 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-18 13:58
> So how about more small, maximum presize?

I don't know what is the benefit of this, but this change looks correct. The benefit of preresizing is vanishingly small for very large dicts.
msg281122 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-18 14:09
This optimization affects only 6-element (and perhaps to less extend 11-element) dicts. But in any case this is good enhancement.

Naoki, could you please make microbenchmarks for 5- and 7-element dicts with your next patch?
msg281123 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-11-18 14:14
Can I assume PY_SSIZE_T_MAX is 2**n-1?
If so, I can define like:

#define PyDict_MAXSIZE (PY_SSIZE_T/8+1)
msg281124 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-18 14:19
> Can I assume PY_SSIZE_T_MAX is 2**n-1?

I think so. Add an assertion just to be sure :-)
msg281136 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-11-18 16:10
> This optimization affects only 6-element (and perhaps to less extend 11-element) dicts. But in any case this is good enhancement.

This affects when minused = (n*2/3)+1 .. n-1 (n=8, 16, ...)

n=8: 6, 7
n=16: 11, 12, ..15

> Naoki, could you please make microbenchmarks for 5- and 7-element dicts with your next patch?

+ /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5}'
master: ..................... 568 ns +- 18 ns
patched: ..................... 567 ns +- 23 ns

Median +- std dev: [master] 568 ns +- 18 ns -> [patched] 567 ns +- 23 ns: 1.00x faster
Not significant!
+ /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
master: ..................... 1.01 us +- 0.04 us
patched: ..................... 756 ns +- 20 ns

Median +- std dev: [master] 1.01 us +- 0.04 us -> [patched] 756 ns +- 20 ns: 1.33x faster
+ /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6, "g":7}'
master: ..................... 1.12 us +- 0.03 us
patched: ..................... 867 ns +- 31 ns

Median +- std dev: [master] 1.12 us +- 0.03 us -> [patched] 867 ns +- 31 ns: 1.29x faster
+ /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6, "g":7, "h":8}'
master: ..................... 1.04 us +- 0.03 us
patched: ..................... 1.03 us +- 0.04 us

Median +- std dev: [master] 1.04 us +- 0.03 us -> [patched] 1.03 us +- 0.04 us: 1.00x faster
Not significant!
+ /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6, "g":7, "h":8, "i":9, "j":10}'
master: ..................... 1.16 us +- 0.04 us
patched: ..................... 1.16 us +- 0.04 us

Median +- std dev: [master] 1.16 us +- 0.04 us -> [patched] 1.16 us +- 0.04 us: 1.00x faster
Not significant!
+ /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6, "g":7, "h":8, "i":9, "j":10, "k":11}'
master: ..................... 1.67 us +- 0.06 us
patched: ..................... 1.39 us +- 0.04 us

Median +- std dev: [master] 1.67 us +- 0.06 us -> [patched] 1.39 us +- 0.04 us: 1.20x faster
msg281353 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-21 12:41
A variant in msg281118 looks better to me than 28731-PyDict_NewPresized-too-small-2.patch. It doesn't look good to me that newsize is set to arbitrary small value if estimated size is larger than very large PyDict_MAXSIZE (this is 2**28 on 32-bit and 2**60 on 64-bit).
msg281354 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-11-21 13:00
OK. I've updated the patch.

BTW, I used const Py_ssize_t for function local constant.
Should I use file scope macro for such heuristic threshold, even it
is used only one function?
msg281355 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-21 13:47
I think function scope constant is good. Similar code in dictresize() should have different bound.

The patch LGTM.
msg281370 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-11-21 15:50
Thanks.
As @haypo suggests, I'll commit this to default branch first, and
backport to 3.6 after 3.6.0 is released.
msg281372 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-11-21 15:57
New changeset b708b3190ecb by INADA Naoki in branch 'default':
Issue #28731: Optimize _PyDict_NewPresized() to create correct size dict
https://hg.python.org/cpython/rev/b708b3190ecb
msg282606 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-12-07 09:39
New changeset d03562dcbb82 by INADA Naoki in branch '3.6':
Issue #28731: Optimize _PyDict_NewPresized() to create correct size dict.
https://hg.python.org/cpython/rev/d03562dcbb82
History
Date User Action Args
2017-03-31 16:36:13dstufftsetpull_requests: + pull_request891
2016-12-07 09:40:06inada.naokisetstatus: open -> closed
resolution: fixed
2016-12-07 09:39:37python-devsetmessages: + msg282606
2016-11-21 15:57:16python-devsetnosy: + python-dev
messages: + msg281372
2016-11-21 15:50:42inada.naokisetmessages: + msg281370
2016-11-21 13:47:14serhiy.storchakasetmessages: + msg281355
stage: patch review -> commit review
2016-11-21 13:00:14inada.naokisetpriority: high -> normal

messages: + msg281354
2016-11-21 12:56:04inada.naokisetfiles: + 28731-PyDict_NewPresized-too-small-3.patch
2016-11-21 12:41:09serhiy.storchakasetmessages: + msg281353
2016-11-21 11:33:29inada.naokisetfiles: + 28731-PyDict_NewPresized-too-small-2.patch
2016-11-18 16:10:45inada.naokisetmessages: + msg281136
2016-11-18 14:19:11vstinnersetmessages: + msg281124
2016-11-18 14:14:29inada.naokisetmessages: + msg281123
2016-11-18 14:09:11serhiy.storchakasetmessages: + msg281122
2016-11-18 13:58:43serhiy.storchakasetmessages: + msg281121
2016-11-18 13:52:22vstinnersetmessages: + msg281120
2016-11-18 13:49:04inada.naokisetmessages: + msg281118
2016-11-18 13:33:15vstinnersetmessages: + msg281117
2016-11-18 12:57:45inada.naokisetmessages: + msg281113
2016-11-18 12:49:04inada.naokisetmessages: + msg281109
2016-11-18 12:31:57serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg281108
2016-11-18 12:18:50vstinnersetmessages: + msg281106
2016-11-18 12:18:49vstinnersetmessages: + msg281105
2016-11-18 11:26:54inada.naokisetmessages: + msg281098
2016-11-18 11:08:28inada.naokisetmessages: + msg281093
2016-11-18 11:03:05inada.naokicreate