msg281092 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2016-11-18 11:03 |
_PyDict_NewPresized(6) creates dict which keysize is 8.
But it's capacity is 5 (8 * 2 // 3 = 5).
This internal API is used by BUILD_MAP and BUILD_CONST_KEYMAP.
|
msg281093 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2016-11-18 11:08 |
This patch includes fix for ESTIMATE_SIZE macro. (see below)
Same fix is included in patch for issue28147.
>>> def estimate_size(n):
... return n * 3 // 2 # Current ESTIMATE_SIZE
...
>>> def usable(n):
... return n * 2 // 3
...
>>> def keysize(minsize):
... size = 8
... while size < minsize: # Current implementation uses <=
... size *= 2
... return size
...
>>> def check():
... for i in range(1000):
... estimate = estimate_size(i)
... size = keysize(estimate)
... cap = usable(size)
... if cap < i:
... print(i, estimate, size, cap)
...
>>> check()
11 16 16 10
43 64 64 42
171 256 256 170
683 1024 1024 682
>>> #
>>> estimate_size = lambda n: (n * 3 +1) // 2 # Fixed version
>>> check()
>>>
|
msg281098 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2016-11-18 11:26 |
current:
$ ~/work/python36/bin/python3 -m perf timeit -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
.....................
Median +- std dev: 925 ns +- 49 ns
patched:
$ ~/work/python36/bin/python3 -m perf timeit -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
.....................
Median +- std dev: 726 ns +- 30 ns
|
msg281105 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-11-18 12:18 |
FYI if you rename Python binaries, you can use:
./python-patched perf timeit --compare-to=./python-orig
|
msg281106 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-11-18 12:18 |
> Median +- std dev: 925 ns +- 49 ns
> Median +- std dev: 726 ns +- 30 ns
Nice speedup. Can you please compare Python 3.5? Better if you compile it
yourself with the same options.
I'm curious if 925 ns is slower than py3.5 or if 726 ns is faster than
py3.5.
|
msg281108 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-11-18 12:31 |
The condition in the loop in _PyDict_NewPresized() contains the test newsize > 0. This is a check for integer overflow. But it doesn't make much sense. First, the overflow is undefined behavior, and it is too late to detect it when it already is happen. Second, after detecting the negative value just is passed to new_keys_object() which either is crashed in debug build or makes other integer overflow and creates invalid object.
I would add a runtime check that minused is less than PY_SSIZE_MAX/3 (or more strong PY_SSIZE_MAX/3*2/sizeof(Pobject *)). This would guarantee that integer overflow is not possible. The test "newsize > 0" could be removed.
There is similar code in dictresize().
|
msg281109 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2016-11-18 12:49 |
inada-n@test1:~/work/py36$ ~/local/py36/bin/patched -m perf timeit --compare-to ~/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
master: ..................... 910 ns +- 49 ns
patched: ..................... 713 ns +- 26 ns
Median +- std dev: [master] 910 ns +- 49 ns -> [patched] 713 ns +- 26 ns: 1.28x faster
inada-n@test1:~/work/py36$ ~/local/py36/bin/patched -m perf timeit --compare-to ~/local/py35/bin/python3.5 -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
python3.5: ..................... 1.17 us +- 0.06 us
patched: ..................... 720 ns +- 24 ns
Median +- std dev: [python3.5] 1.17 us +- 0.06 us -> [patched] 720 ns +- 24 ns: 1.62x faster
|
msg281113 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2016-11-18 12:57 |
Python 3.5 has similar issue:
$ ~/local/py35/bin/patched -m perf timeit --compare-to ~/local/py35/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
master: ..................... 1.15 us +- 0.09 us
patched: ..................... 885 ns +- 37 ns
Median +- std dev: [master] 1.15 us +- 0.09 us -> [patched] 885 ns +- 37 ns: 1.30x faster
patch:
diff -r 20f62e4a9c2f Objects/dictobject.c
--- a/Objects/dictobject.c Wed Nov 16 16:32:22 2016 -0800
+++ b/Objects/dictobject.c Fri Nov 18 12:56:38 2016 +0000
@@ -1015,8 +1015,9 @@ PyObject *
{
Py_ssize_t newsize;
PyDictKeysObject *new_keys;
+ minused = (minused * 3 + 1) / 2;
for (newsize = PyDict_MINSIZE_COMBINED;
- newsize <= minused && newsize > 0;
+ newsize < minused && newsize > 0;
newsize <<= 1)
;
new_keys = new_keys_object(newsize);
|
msg281117 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-11-18 13:33 |
Ok, it's an optimization, not a performance regression of Python 3.6. In this case, I suggest to first push the change to Python 3.7, and after 3.6.0 release, backport to 3.6 (if you want to), and maybe even 3.5 if it makes sense (and if you want).
I didn't review the patch.
|
msg281118 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2016-11-18 13:49 |
On Fri, Nov 18, 2016 at 9:31 PM, Serhiy Storchaka
<report@bugs.python.org> wrote:
>
> Serhiy Storchaka added the comment:
>
> The condition in the loop in _PyDict_NewPresized() contains the test newsize > 0. This is a check for integer overflow. But it doesn't make much sense. First, the overflow is undefined behavior, and it is too late to detect it when it already is happen. Second, after detecting the negative value just is passed to new_keys_object() which either is crashed in debug build or makes other integer overflow and creates invalid object.
>
> I would add a runtime check that minused is less than PY_SSIZE_MAX/3 (or more strong PY_SSIZE_MAX/3*2/sizeof(Pobject *)). This would guarantee that integer overflow is not possible. The test "newsize > 0" could be removed.
>
> There is similar code in dictresize().
>
Nice idea. I'll update patch in issue28147.
In case of _PyDict_NewPresized(minused), it would be called from 3rd
party libraries, and there are no strong
guarantee about PyDict_SetItem() won't resize until minused items.
So how about more small, maximum presize?
#define MAX_INITSIZE (128 * 1024)
if (minused > USABLE_FRACTION(MAX_INITSIZE)) {
newsize = MAX_INITSIZE;
}
else {
newsize = PyDict_MINSIZE;
whilie (newsize < minused)
newsize <<= 1; // Can't we assume *= 2 is optimized?
};
|
msg281120 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-11-18 13:52 |
Even if the use case is limited (creation of small dictionaries), it's a nice enhancement, good job Naoki! It's nice to see someone looking at this very important Python type!
1.28x faster or 1.62x faster is significant for a micro-optimization! (My personal threshold is at least 10% faster for a micro-optimization.)
|
msg281121 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-11-18 13:58 |
> So how about more small, maximum presize?
I don't know what is the benefit of this, but this change looks correct. The benefit of preresizing is vanishingly small for very large dicts.
|
msg281122 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-11-18 14:09 |
This optimization affects only 6-element (and perhaps to less extend 11-element) dicts. But in any case this is good enhancement.
Naoki, could you please make microbenchmarks for 5- and 7-element dicts with your next patch?
|
msg281123 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2016-11-18 14:14 |
Can I assume PY_SSIZE_T_MAX is 2**n-1?
If so, I can define like:
#define PyDict_MAXSIZE (PY_SSIZE_T/8+1)
|
msg281124 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-11-18 14:19 |
> Can I assume PY_SSIZE_T_MAX is 2**n-1?
I think so. Add an assertion just to be sure :-)
|
msg281136 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2016-11-18 16:10 |
> This optimization affects only 6-element (and perhaps to less extend 11-element) dicts. But in any case this is good enhancement.
This affects when minused = (n*2/3)+1 .. n-1 (n=8, 16, ...)
n=8: 6, 7
n=16: 11, 12, ..15
> Naoki, could you please make microbenchmarks for 5- and 7-element dicts with your next patch?
+ /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5}'
master: ..................... 568 ns +- 18 ns
patched: ..................... 567 ns +- 23 ns
Median +- std dev: [master] 568 ns +- 18 ns -> [patched] 567 ns +- 23 ns: 1.00x faster
Not significant!
+ /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
master: ..................... 1.01 us +- 0.04 us
patched: ..................... 756 ns +- 20 ns
Median +- std dev: [master] 1.01 us +- 0.04 us -> [patched] 756 ns +- 20 ns: 1.33x faster
+ /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6, "g":7}'
master: ..................... 1.12 us +- 0.03 us
patched: ..................... 867 ns +- 31 ns
Median +- std dev: [master] 1.12 us +- 0.03 us -> [patched] 867 ns +- 31 ns: 1.29x faster
+ /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6, "g":7, "h":8}'
master: ..................... 1.04 us +- 0.03 us
patched: ..................... 1.03 us +- 0.04 us
Median +- std dev: [master] 1.04 us +- 0.03 us -> [patched] 1.03 us +- 0.04 us: 1.00x faster
Not significant!
+ /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6, "g":7, "h":8, "i":9, "j":10}'
master: ..................... 1.16 us +- 0.04 us
patched: ..................... 1.16 us +- 0.04 us
Median +- std dev: [master] 1.16 us +- 0.04 us -> [patched] 1.16 us +- 0.04 us: 1.00x faster
Not significant!
+ /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6, "g":7, "h":8, "i":9, "j":10, "k":11}'
master: ..................... 1.67 us +- 0.06 us
patched: ..................... 1.39 us +- 0.04 us
Median +- std dev: [master] 1.67 us +- 0.06 us -> [patched] 1.39 us +- 0.04 us: 1.20x faster
|
msg281353 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-11-21 12:41 |
A variant in msg281118 looks better to me than 28731-PyDict_NewPresized-too-small-2.patch. It doesn't look good to me that newsize is set to arbitrary small value if estimated size is larger than very large PyDict_MAXSIZE (this is 2**28 on 32-bit and 2**60 on 64-bit).
|
msg281354 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2016-11-21 13:00 |
OK. I've updated the patch.
BTW, I used const Py_ssize_t for function local constant.
Should I use file scope macro for such heuristic threshold, even it
is used only one function?
|
msg281355 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-11-21 13:47 |
I think function scope constant is good. Similar code in dictresize() should have different bound.
The patch LGTM.
|
msg281370 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2016-11-21 15:50 |
Thanks.
As @haypo suggests, I'll commit this to default branch first, and
backport to 3.6 after 3.6.0 is released.
|
msg281372 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2016-11-21 15:57 |
New changeset b708b3190ecb by INADA Naoki in branch 'default':
Issue #28731: Optimize _PyDict_NewPresized() to create correct size dict
https://hg.python.org/cpython/rev/b708b3190ecb
|
msg282606 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2016-12-07 09:39 |
New changeset d03562dcbb82 by INADA Naoki in branch '3.6':
Issue #28731: Optimize _PyDict_NewPresized() to create correct size dict.
https://hg.python.org/cpython/rev/d03562dcbb82
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:39 | admin | set | github: 72917 |
2017-03-31 16:36:13 | dstufft | set | pull_requests:
+ pull_request891 |
2016-12-07 09:40:06 | methane | set | status: open -> closed resolution: fixed |
2016-12-07 09:39:37 | python-dev | set | messages:
+ msg282606 |
2016-11-21 15:57:16 | python-dev | set | nosy:
+ python-dev messages:
+ msg281372
|
2016-11-21 15:50:42 | methane | set | messages:
+ msg281370 |
2016-11-21 13:47:14 | serhiy.storchaka | set | messages:
+ msg281355 stage: patch review -> commit review |
2016-11-21 13:00:14 | methane | set | priority: high -> normal
messages:
+ msg281354 |
2016-11-21 12:56:04 | methane | set | files:
+ 28731-PyDict_NewPresized-too-small-3.patch |
2016-11-21 12:41:09 | serhiy.storchaka | set | messages:
+ msg281353 |
2016-11-21 11:33:29 | methane | set | files:
+ 28731-PyDict_NewPresized-too-small-2.patch |
2016-11-18 16:10:45 | methane | set | messages:
+ msg281136 |
2016-11-18 14:19:11 | vstinner | set | messages:
+ msg281124 |
2016-11-18 14:14:29 | methane | set | messages:
+ msg281123 |
2016-11-18 14:09:11 | serhiy.storchaka | set | messages:
+ msg281122 |
2016-11-18 13:58:43 | serhiy.storchaka | set | messages:
+ msg281121 |
2016-11-18 13:52:22 | vstinner | set | messages:
+ msg281120 |
2016-11-18 13:49:04 | methane | set | messages:
+ msg281118 |
2016-11-18 13:33:15 | vstinner | set | messages:
+ msg281117 |
2016-11-18 12:57:45 | methane | set | messages:
+ msg281113 |
2016-11-18 12:49:04 | methane | set | messages:
+ msg281109 |
2016-11-18 12:31:57 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg281108
|
2016-11-18 12:18:50 | vstinner | set | messages:
+ msg281106 |
2016-11-18 12:18:49 | vstinner | set | messages:
+ msg281105 |
2016-11-18 11:26:54 | methane | set | messages:
+ msg281098 |
2016-11-18 11:08:28 | methane | set | messages:
+ msg281093 |
2016-11-18 11:03:05 | methane | create | |