Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_PyDict_NewPresized() creates too small dict #72917

Closed
methane opened this issue Nov 18, 2016 · 22 comments
Closed

_PyDict_NewPresized() creates too small dict #72917

methane opened this issue Nov 18, 2016 · 22 comments
Assignees
Labels
3.7 (EOL) end of life performance Performance or resource usage

Comments

@methane
Copy link
Member

methane commented Nov 18, 2016

BPO 28731
Nosy @vstinner, @methane, @serhiy-storchaka
PRs
  • [Do Not Merge] Convert Misc/NEWS so that it is managed by towncrier #552
  • Files
  • PyDict_NewPresized-too-small.patch
  • 28731-PyDict_NewPresized-too-small-2.patch
  • 28731-PyDict_NewPresized-too-small-3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/methane'
    closed_at = <Date 2016-12-07.09:40:06.027>
    created_at = <Date 2016-11-18.11:03:05.310>
    labels = ['3.7', 'performance']
    title = '_PyDict_NewPresized() creates too small dict'
    updated_at = <Date 2017-03-31.16:36:13.778>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2017-03-31.16:36:13.778>
    actor = 'dstufft'
    assignee = 'methane'
    closed = True
    closed_date = <Date 2016-12-07.09:40:06.027>
    closer = 'methane'
    components = []
    creation = <Date 2016-11-18.11:03:05.310>
    creator = 'methane'
    dependencies = []
    files = ['45531', '45583', '45586']
    hgrepos = []
    issue_num = 28731
    keywords = ['patch']
    message_count = 22.0
    messages = ['281092', '281093', '281098', '281105', '281106', '281108', '281109', '281113', '281117', '281118', '281120', '281121', '281122', '281123', '281124', '281136', '281353', '281354', '281355', '281370', '281372', '282606']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'methane', 'python-dev', 'serhiy.storchaka']
    pr_nums = ['552']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue28731'
    versions = ['Python 3.6', 'Python 3.7']

    @methane
    Copy link
    Member Author

    methane commented Nov 18, 2016

    _PyDict_NewPresized(6) creates dict which keysize is 8.
    But it's capacity is 5 (8 * 2 // 3 = 5).

    This internal API is used by BUILD_MAP and BUILD_CONST_KEYMAP.

    @methane methane added the 3.7 (EOL) end of life label Nov 18, 2016
    @methane methane self-assigned this Nov 18, 2016
    @methane methane added the performance Performance or resource usage label Nov 18, 2016
    @methane
    Copy link
    Member Author

    methane commented Nov 18, 2016

    This patch includes fix for ESTIMATE_SIZE macro. (see below)
    Same fix is included in patch for bpo-28147.

    >>> def estimate_size(n):
    ...     return n * 3 // 2  # Current ESTIMATE_SIZE
    ...
    >>> def usable(n):
    ...     return n * 2 // 3
    ...
    >>> def keysize(minsize):
    ...     size = 8
    ...     while size < minsize:  # Current implementation uses <=
    ...         size *= 2
    ...     return size
    ...
    >>> def check():
    ...     for i in range(1000):
    ...         estimate = estimate_size(i)
    ...         size = keysize(estimate)
    ...         cap = usable(size)
    ...         if cap < i:
    ...             print(i, estimate, size, cap)
    ...
    >>> check()
    11 16 16 10
    43 64 64 42
    171 256 256 170
    683 1024 1024 682
    >>> # 
    >>> estimate_size = lambda n: (n * 3 +1) // 2  # Fixed version
    >>> check()
    >>>

    @methane
    Copy link
    Member Author

    methane commented Nov 18, 2016

    current:
    $ ~/work/python36/bin/python3 -m perf timeit -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
    .....................
    Median +- std dev: 925 ns +- 49 ns

    patched:
    $ ~/work/python36/bin/python3 -m perf timeit -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
    .....................
    Median +- std dev: 726 ns +- 30 ns

    @vstinner
    Copy link
    Member

    FYI if you rename Python binaries, you can use:

    ./python-patched perf timeit --compare-to=./python-orig

    @vstinner
    Copy link
    Member

    Median +- std dev: 925 ns +- 49 ns
    Median +- std dev: 726 ns +- 30 ns

    Nice speedup. Can you please compare Python 3.5? Better if you compile it
    yourself with the same options.

    I'm curious if 925 ns is slower than py3.5 or if 726 ns is faster than
    py3.5.

    @serhiy-storchaka
    Copy link
    Member

    The condition in the loop in _PyDict_NewPresized() contains the test newsize > 0. This is a check for integer overflow. But it doesn't make much sense. First, the overflow is undefined behavior, and it is too late to detect it when it already is happen. Second, after detecting the negative value just is passed to new_keys_object() which either is crashed in debug build or makes other integer overflow and creates invalid object.

    I would add a runtime check that minused is less than PY_SSIZE_MAX/3 (or more strong PY_SSIZE_MAX/3*2/sizeof(Pobject *)). This would guarantee that integer overflow is not possible. The test "newsize > 0" could be removed.

    There is similar code in dictresize().

    @methane
    Copy link
    Member Author

    methane commented Nov 18, 2016

    inada-n@test1:~/work/py36$ ~/local/py36/bin/patched -m perf timeit --compare-to ~/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
    master: ..................... 910 ns +- 49 ns
    patched: ..................... 713 ns +- 26 ns

    Median +- std dev: [master] 910 ns +- 49 ns -> [patched] 713 ns +- 26 ns: 1.28x faster

    inada-n@test1:~/work/py36$ ~/local/py36/bin/patched -m perf timeit --compare-to ~/local/py35/bin/python3.5 -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
    python3.5: ..................... 1.17 us +- 0.06 us
    patched: ..................... 720 ns +- 24 ns

    Median +- std dev: [python3.5] 1.17 us +- 0.06 us -> [patched] 720 ns +- 24 ns: 1.62x faster

    @methane
    Copy link
    Member Author

    methane commented Nov 18, 2016

    Python 3.5 has similar issue:
    $ ~/local/py35/bin/patched -m perf timeit --compare-to ~/local/py35/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
    master: ..................... 1.15 us +- 0.09 us
    patched: ..................... 885 ns +- 37 ns

    Median +- std dev: [master] 1.15 us +- 0.09 us -> [patched] 885 ns +- 37 ns: 1.30x faster

    patch:
    diff -r 20f62e4a9c2f Objects/dictobject.c
    --- a/Objects/dictobject.c      Wed Nov 16 16:32:22 2016 -0800
    +++ b/Objects/dictobject.c      Fri Nov 18 12:56:38 2016 +0000
    @@ -1015,8 +1015,9 @@ PyObject *
     {
         Py_ssize_t newsize;
         PyDictKeysObject *new_keys;
    +    minused = (minused * 3 + 1) / 2;
         for (newsize = PyDict_MINSIZE_COMBINED;
    -         newsize <= minused && newsize > 0;
    +         newsize < minused && newsize > 0;
              newsize <<= 1)
             ;
         new_keys = new_keys_object(newsize);

    @vstinner
    Copy link
    Member

    Ok, it's an optimization, not a performance regression of Python 3.6. In this case, I suggest to first push the change to Python 3.7, and after 3.6.0 release, backport to 3.6 (if you want to), and maybe even 3.5 if it makes sense (and if you want).

    I didn't review the patch.

    @methane
    Copy link
    Member Author

    methane commented Nov 18, 2016

    On Fri, Nov 18, 2016 at 9:31 PM, Serhiy Storchaka
    <report@bugs.python.org> wrote:

    Serhiy Storchaka added the comment:

    The condition in the loop in _PyDict_NewPresized() contains the test newsize > 0. This is a check for integer overflow. But it doesn't make much sense. First, the overflow is undefined behavior, and it is too late to detect it when it already is happen. Second, after detecting the negative value just is passed to new_keys_object() which either is crashed in debug build or makes other integer overflow and creates invalid object.

    I would add a runtime check that minused is less than PY_SSIZE_MAX/3 (or more strong PY_SSIZE_MAX/3*2/sizeof(Pobject *)). This would guarantee that integer overflow is not possible. The test "newsize > 0" could be removed.

    There is similar code in dictresize().

    Nice idea. I'll update patch in bpo-28147.

    In case of _PyDict_NewPresized(minused), it would be called from 3rd
    party libraries, and there are no strong
    guarantee about PyDict_SetItem() won't resize until minused items.
    So how about more small, maximum presize?

    #define MAX_INITSIZE (128 * 1024)
    
    if (minused > USABLE_FRACTION(MAX_INITSIZE)) {
        newsize = MAX_INITSIZE;
    }
    else {
        newsize = PyDict_MINSIZE;
        whilie (newsize < minused)
            newsize <<= 1;   // Can't we assume *= 2 is optimized?
    };

    @vstinner
    Copy link
    Member

    Even if the use case is limited (creation of small dictionaries), it's a nice enhancement, good job Naoki! It's nice to see someone looking at this very important Python type!

    1.28x faster or 1.62x faster is significant for a micro-optimization! (My personal threshold is at least 10% faster for a micro-optimization.)

    @serhiy-storchaka
    Copy link
    Member

    So how about more small, maximum presize?

    I don't know what is the benefit of this, but this change looks correct. The benefit of preresizing is vanishingly small for very large dicts.

    @serhiy-storchaka
    Copy link
    Member

    This optimization affects only 6-element (and perhaps to less extend 11-element) dicts. But in any case this is good enhancement.

    Naoki, could you please make microbenchmarks for 5- and 7-element dicts with your next patch?

    @methane
    Copy link
    Member Author

    methane commented Nov 18, 2016

    Can I assume PY_SSIZE_T_MAX is 2**n-1?
    If so, I can define like:

    #define PyDict_MAXSIZE (PY_SSIZE_T/8+1)

    @vstinner
    Copy link
    Member

    Can I assume PY_SSIZE_T_MAX is 2**n-1?

    I think so. Add an assertion just to be sure :-)

    @methane
    Copy link
    Member Author

    methane commented Nov 18, 2016

    This optimization affects only 6-element (and perhaps to less extend 11-element) dicts. But in any case this is good enhancement.

    This affects when minused = (n*2/3)+1 .. n-1 (n=8, 16, ...)

    n=8: 6, 7
    n=16: 11, 12, ..15

    Naoki, could you please make microbenchmarks for 5- and 7-element dicts with your next patch?

    + /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5}'
    master: ..................... 568 ns +- 18 ns
    patched: ..................... 567 ns +- 23 ns

    Median +- std dev: [master] 568 ns +- 18 ns -> [patched] 567 ns +- 23 ns: 1.00x faster
    Not significant!
    + /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}'
    master: ..................... 1.01 us +- 0.04 us
    patched: ..................... 756 ns +- 20 ns

    Median +- std dev: [master] 1.01 us +- 0.04 us -> [patched] 756 ns +- 20 ns: 1.33x faster
    + /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6, "g":7}'
    master: ..................... 1.12 us +- 0.03 us
    patched: ..................... 867 ns +- 31 ns

    Median +- std dev: [master] 1.12 us +- 0.03 us -> [patched] 867 ns +- 31 ns: 1.29x faster
    + /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6, "g":7, "h":8}'
    master: ..................... 1.04 us +- 0.03 us
    patched: ..................... 1.03 us +- 0.04 us

    Median +- std dev: [master] 1.04 us +- 0.03 us -> [patched] 1.03 us +- 0.04 us: 1.00x faster
    Not significant!
    + /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6, "g":7, "h":8, "i":9, "j":10}'
    master: ..................... 1.16 us +- 0.04 us
    patched: ..................... 1.16 us +- 0.04 us

    Median +- std dev: [master] 1.16 us +- 0.04 us -> [patched] 1.16 us +- 0.04 us: 1.00x faster
    Not significant!
    + /home/inada-n/local/py36/bin/patched -m perf timeit --compare-to /home/inada-n/local/py36/bin/master -- 'd = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6, "g":7, "h":8, "i":9, "j":10, "k":11}'
    master: ..................... 1.67 us +- 0.06 us
    patched: ..................... 1.39 us +- 0.04 us

    Median +- std dev: [master] 1.67 us +- 0.06 us -> [patched] 1.39 us +- 0.04 us: 1.20x faster

    @serhiy-storchaka
    Copy link
    Member

    A variant in msg281118 looks better to me than 28731-PyDict_NewPresized-too-small-2.patch. It doesn't look good to me that newsize is set to arbitrary small value if estimated size is larger than very large PyDict_MAXSIZE (this is 2**28 on 32-bit and 2**60 on 64-bit).

    @methane
    Copy link
    Member Author

    methane commented Nov 21, 2016

    OK. I've updated the patch.

    BTW, I used const Py_ssize_t for function local constant.
    Should I use file scope macro for such heuristic threshold, even it
    is used only one function?

    @serhiy-storchaka
    Copy link
    Member

    I think function scope constant is good. Similar code in dictresize() should have different bound.

    The patch LGTM.

    @methane
    Copy link
    Member Author

    methane commented Nov 21, 2016

    Thanks.
    As @Haypo suggests, I'll commit this to default branch first, and
    backport to 3.6 after 3.6.0 is released.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 21, 2016

    New changeset b708b3190ecb by INADA Naoki in branch 'default':
    Issue bpo-28731: Optimize _PyDict_NewPresized() to create correct size dict
    https://hg.python.org/cpython/rev/b708b3190ecb

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 7, 2016

    New changeset d03562dcbb82 by INADA Naoki in branch '3.6':
    Issue bpo-28731: Optimize _PyDict_NewPresized() to create correct size dict.
    https://hg.python.org/cpython/rev/d03562dcbb82

    @methane methane closed this as completed Dec 7, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants