Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 509: Add ma_version to PyDictObject #70246

Closed
vstinner opened this issue Jan 9, 2016 · 31 comments
Closed

PEP 509: Add ma_version to PyDictObject #70246

vstinner opened this issue Jan 9, 2016 · 31 comments
Labels
type-feature A feature request or enhancement

Comments

@vstinner
Copy link
Member

vstinner commented Jan 9, 2016

BPO 26058
Nosy @malemburg, @brettcannon, @pitrou, @vstinner, @bitdancer, @1st1, @jstasiak
Files
  • dict_version.patch
  • dict_version-2.patch
  • dict_version-3.patch
  • dict_version-4.patch
  • guard_benchmark.patch
  • guard_benchmark.py
  • dict_version-5.patch
  • dict_version-6.patch
  • dict_version-7.patch
  • dict_version-8.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-09-08.20:07:22.991>
    created_at = <Date 2016-01-09.09:30:48.408>
    labels = ['type-feature']
    title = 'PEP 509: Add ma_version to PyDictObject'
    updated_at = <Date 2016-09-10.04:52:35.818>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2016-09-10.04:52:35.818>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-09-08.20:07:22.991>
    closer = 'vstinner'
    components = []
    creation = <Date 2016-01-09.09:30:48.408>
    creator = 'vstinner'
    dependencies = []
    files = ['41542', '41547', '41581', '41599', '41604', '41605', '41691', '41701', '42476', '42518']
    hgrepos = []
    issue_num = 26058
    keywords = ['patch']
    message_count = 31.0
    messages = ['257809', '257810', '257814', '257818', '257819', '257820', '257848', '257875', '257880', '257961', '257992', '258138', '258139', '258151', '258820', '258824', '258826', '258867', '263414', '263525', '263726', '263727', '263728', '263729', '263730', '263731', '263732', '263787', '275132', '275133', '275566']
    nosy_count = 8.0
    nosy_names = ['lemburg', 'brett.cannon', 'pitrou', 'vstinner', 'r.david.murray', 'python-dev', 'yselivanov', 'jstasiak']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue26058'
    versions = ['Python 3.6']

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 9, 2016

    Attached patch implements the following PEP currently discussed on python-ideas:
    https://faster-cpython.readthedocs.org/pep_dict_version.html#pep-dict-version

    @vstinner vstinner added the type-feature A feature request or enhancement label Jan 9, 2016
    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 9, 2016

    Result of pybench on Dict microbenchmarks:

                  DictCreation:    44ms    49ms  -10.5%    44ms    49ms  -10.7%
             DictWithFloatKeys:    35ms    35ms   -0.5%    35ms    35ms   -1.0%
           DictWithIntegerKeys:    28ms    28ms   -1.2%    28ms    29ms   -2.3%
            DictWithStringKeys:    26ms    27ms   -3.2%    26ms    28ms   -4.8%
        SimpleDictManipulation:    52ms    53ms   -0.7%    53ms    53ms   -0.4%
    

    Hum, as usuall, pybench doesn't seem reliable at all: I expect worse performance with the patch since it adds "version++" in dict.__setimte__(). I don't really trust pybench, results seem to have a lot of noise :-/

    Maybe I'm not using pybench correctly? I used:

    $ make distclean; ./configure && make
    $ ./python Tools/pybench/pybench.py -f pybench.default
    $ patch -p1 < dict_version.patch
    $ ./python Tools/pybench/pybench.py -f pybench.dictversion
    $ ./python Tools/pybench/pybench.py -s pybench.dictversion -c pybench.default

    Full output:

    -------------------------------------------------------------------------------
    PYBENCH 2.1
    -------------------------------------------------------------------------------

    • using CPython 3.6.0a0 (default:53271aa4d84c+, Jan 9 2016, 10:27:40) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)]
    • disabled garbage collection
    • system check interval set to maximum: 2147483647
    • using timer: time.perf_counter
    • timer: resolution=1e-09, implementation=clock_gettime(CLOCK_MONOTONIC)

    -------------------------------------------------------------------------------
    Benchmark: pybench.dictversion
    -------------------------------------------------------------------------------

    Rounds: 10
    Warp:   10
    Timer:  time.perf_counter
    
    Machine Details:
       Platform ID:    Linux-4.2.5-300.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three
       Processor:      x86_64
    
    Python:
       Implementation: CPython
       Executable:     /home/haypo/prog/python/default/python
       Version:        3.6.0a0
       Compiler:       GCC 5.1.1 20150618 (Red Hat 5.1.1-4)
       Bits:           64bit
       Build:          Jan  9 2016 10:27:40 (#default:53271aa4d84c+)
       Unicode:        UCS4
    

    -------------------------------------------------------------------------------
    Comparing with: pybench.default
    -------------------------------------------------------------------------------

    Rounds: 10
    Warp:   10
    Timer:  time.perf_counter
    
    Machine Details:
       Platform ID:    Linux-4.2.5-300.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three
       Processor:      x86_64
    
    Python:
       Implementation: CPython
       Executable:     /home/haypo/prog/python/default/python
       Version:        3.6.0a0
       Compiler:       GCC 5.1.1 20150618 (Red Hat 5.1.1-4)
       Bits:           64bit
       Build:          Jan  9 2016 10:21:57 (#default:53271aa4d84c)
       Unicode:        UCS4
    

    Test minimum run-time average run-time
    this other diff this other diff
    -------------------------------------------------------------------------------
    BuiltinFunctionCalls: 42ms 42ms +0.1% 43ms 43ms -0.9%
    BuiltinMethodLookup: 26ms 25ms +5.7% 26ms 25ms +5.1%
    CompareFloats: 27ms 27ms -0.9% 27ms 28ms -4.2%
    CompareFloatsIntegers: 60ms 63ms -3.3% 61ms 65ms -6.8%
    CompareIntegers: 41ms 38ms +7.9% 41ms 38ms +7.2%
    CompareInternedStrings: 30ms 28ms +5.7% 30ms 28ms +5.0%
    CompareLongs: 24ms 22ms +8.6% 24ms 22ms +8.5%
    CompareStrings: 22ms 22ms +0.3% 23ms 24ms -6.6%
    ComplexPythonFunctionCalls: 43ms 42ms +1.2% 43ms 45ms -5.2%
    ConcatStrings: 29ms 32ms -11.0% 29ms 33ms -13.6%
    CreateInstances: 45ms 45ms +0.3% 46ms 46ms -0.4%
    CreateNewInstances: 34ms 34ms +0.6% 35ms 34ms +0.7%
    CreateStringsWithConcat: 58ms 58ms +0.1% 58ms 58ms -0.1%
    DictCreation: 44ms 49ms -10.5% 44ms 49ms -10.7%
    DictWithFloatKeys: 35ms 35ms -0.5% 35ms 35ms -1.0%
    DictWithIntegerKeys: 28ms 28ms -1.2% 28ms 29ms -2.3%
    DictWithStringKeys: 26ms 27ms -3.2% 26ms 28ms -4.8%
    ForLoops: 22ms 22ms +0.4% 22ms 22ms +0.6%
    IfThenElse: 34ms 34ms +0.9% 34ms 34ms +0.8%
    ListSlicing: 34ms 34ms -0.2% 34ms 34ms -0.1%
    NestedForLoops: 37ms 36ms +2.1% 37ms 36ms +2.1%
    NestedListComprehensions: 36ms 35ms +1.4% 36ms 36ms +1.8%
    NormalClassAttribute: 75ms 77ms -2.5% 75ms 77ms -2.3%
    NormalInstanceAttribute: 37ms 37ms +2.2% 38ms 37ms +2.5%
    PythonFunctionCalls: 37ms 36ms +1.8% 37ms 37ms +1.6%
    PythonMethodCalls: 50ms 47ms +5.5% 50ms 48ms +4.5%
    Recursion: 61ms 61ms -0.2% 61ms 62ms -0.2%
    SecondImport: 35ms 36ms -2.9% 35ms 37ms -3.3%
    SecondPackageImport: 37ms 37ms -0.3% 37ms 37ms -0.4%
    SecondSubmoduleImport: 89ms 87ms +2.1% 90ms 88ms +1.6%
    SimpleComplexArithmetic: 23ms 23ms +0.0% 24ms 24ms -0.1%
    SimpleDictManipulation: 52ms 53ms -0.7% 53ms 53ms -0.4%
    SimpleFloatArithmetic: 25ms 25ms -1.2% 25ms 25ms -1.0%
    SimpleIntFloatArithmetic: 32ms 32ms -0.3% 32ms 32ms -1.1%
    SimpleIntegerArithmetic: 32ms 32ms -0.6% 32ms 32ms -0.3%
    SimpleListComprehensions: 29ms 28ms +2.3% 30ms 29ms +3.2%
    SimpleListManipulation: 27ms 28ms -1.3% 28ms 28ms -1.4%
    SimpleLongArithmetic: 22ms 22ms +3.6% 23ms 22ms +4.4%
    SmallLists: 38ms 38ms -1.9% 38ms 41ms -7.0%
    SmallTuples: 47ms 44ms +7.8% 47ms 44ms +7.5%
    SpecialClassAttribute: 75ms 73ms +2.3% 75ms 73ms +2.2%
    SpecialInstanceAttribute: 38ms 39ms -1.8% 38ms 39ms -2.1%
    StringMappings: 84ms 83ms +0.8% 84ms 84ms +0.5%
    StringPredicates: 49ms 49ms -0.6% 49ms 49ms -1.0%
    StringSlicing: 42ms 42ms +0.8% 42ms 42ms +0.7%
    TryExcept: 25ms 25ms -0.4% 25ms 25ms -0.8%
    TryFinally: 31ms 31ms +1.1% 32ms 31ms +1.4%
    TryRaiseExcept: 12ms 12ms -0.8% 12ms 12ms -1.4%
    TupleSlicing: 43ms 43ms +0.1% 43ms 43ms -0.6%
    WithFinally: 49ms 50ms -0.9% 49ms 50ms -0.9%
    WithRaiseExcept: 39ms 39ms -0.6% 39ms 40ms -1.6%
    -------------------------------------------------------------------------------
    Totals: 2011ms 2006ms +0.2% 2025ms 2035ms -0.5%

    (this=pybench.dictversion, other=pybench.default)

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 9, 2016

    Using timeit to microbenchmark dict operations (constructor, __setitem__, __delitem__, clear), I get exactly the same performance or even better performance (???) with the patch.

    ./python -m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()'

    Original: 318 ns
    Patched: 318 ns

    ./python -m timeit 'd={i:i for i in range(2**16)}' 'for i in range(2**16): d[i]=i-1' 'for i in range(2**16): d[i]=i+1' 'for i in range(2**15): del d[i]' 'd.clear()'

    Original: 19.9 ms
    Patched: 18.9 ms (5% faster)

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 9, 2016

    Updated patch, I forgot to include unit tests.

    I added a test on integer overflow on the version. The patch adds _testcapi.dict_setversion(dict, version) and _testcapi.PY_SIZE_MAX.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 9, 2016

    I don't understand this:

    The version 0 is reserved for "missing key"

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 9, 2016

    I don't understand this:
    > The version 0 is reserved for "missing key"

    Oh, ignore it, it's an outdated comment :-) Dictionary version is initialized to 0, it's not more "reserved".

    FYI I added the comment in my first implementation which also added a version to dictionary *entries*.

    @brettcannon
    Copy link
    Member

    A better way to benchmark this is to run hg.python.org/benchmarks with a patched interpreter and see if that shows any performance impact.

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 9, 2016

    Brett:

    A better way to benchmark this is to run hg.python.org/benchmarks with a patched interpreter and see if that shows any performance impact.

    Ok. Here is the output. Can you please explain me the result? :-) Especially the "significant" value.

    $ ./python.orig ../benchmarks/perf.py ./python.orig ./python.patched

    INFO:root:Automatically selected timer: perf_counter
    INFO:root:Running ./python.patched ../benchmarks/lib3/2to3/2to3 -f all ../benchmarks/lib/2to3
    INFO:root:Running ./python.patched ../benchmarks/lib3/2to3/2to3 -f all ../benchmarks/lib/2to3 1 time
    INFO:root:Running ./python.orig ../benchmarks/lib3/2to3/2to3 -f all ../benchmarks/lib/2to3
    INFO:root:Running ./python.orig ../benchmarks/lib3/2to3/2to3 -f all ../benchmarks/lib/2to3 1 time
    INFO:root:Running ./python.patched ../benchmarks/performance/bm_chameleon_v2.py -n 50 --timer perf_counter
    INFO:root:Running ./python.orig ../benchmarks/performance/bm_chameleon_v2.py -n 50 --timer perf_counter
    INFO:root:Running ./python.patched ../benchmarks/performance/bm_django_v3.py -n 50 --timer perf_counter
    INFO:root:Running ./python.orig ../benchmarks/performance/bm_django_v3.py -n 50 --timer perf_counter
    INFO:root:Running ./python.patched ../benchmarks/performance/bm_pickle.py -n 50 --timer perf_counter --use_cpickle pickle
    INFO:root:Running ./python.orig ../benchmarks/performance/bm_pickle.py -n 50 --timer perf_counter --use_cpickle pickle
    INFO:root:Running ./python.patched ../benchmarks/performance/bm_pickle.py -n 50 --timer perf_counter --use_cpickle unpickle
    INFO:root:Running ./python.orig ../benchmarks/performance/bm_pickle.py -n 50 --timer perf_counter --use_cpickle unpickle
    INFO:root:Running ./python.patched ../benchmarks/performance/bm_json_v2.py -n 50 --timer perf_counter
    INFO:root:Running ./python.orig ../benchmarks/performance/bm_json_v2.py -n 50 --timer perf_counter
    INFO:root:Running ./python.patched ../benchmarks/performance/bm_json.py -n 50 --timer perf_counter json_load
    INFO:root:Running ./python.orig ../benchmarks/performance/bm_json.py -n 50 --timer perf_counter json_load
    INFO:root:Running ./python.patched ../benchmarks/performance/bm_nbody.py -n 50 --timer perf_counter
    INFO:root:Running ./python.orig ../benchmarks/performance/bm_nbody.py -n 50 --timer perf_counter
    INFO:root:Running ./python.patched ../benchmarks/performance/bm_regex_v8.py -n 50 --timer perf_counter
    INFO:root:Running ./python.orig ../benchmarks/performance/bm_regex_v8.py -n 50 --timer perf_counter
    INFO:root:Running ./python.patched ../benchmarks/performance/bm_tornado_http.py -n 100 --timer perf_counter
    INFO:root:Running ./python.orig ../benchmarks/performance/bm_tornado_http.py -n 100 --timer perf_counter
    [ 1/10] 2to3...
    [ 2/10] chameleon_v2...
    [ 3/10] django_v3...
    [ 4/10] fastpickle...
    [ 5/10] fastunpickle...
    [ 6/10] json_dump_v2...
    [ 7/10] json_load...
    [ 8/10] nbody...
    [ 9/10] regex_v8...
    [10/10] tornado_http...

    Report on Linux smithers 4.2.8-300.fc23.x86_64 #1 SMP Tue Dec 15 16:49:06 UTC 2015 x86_64 x86_64
    Total CPU cores: 8

    ### 2to3 ###
    6.825440 -> 6.996616: 1.03x slower

    ### fastpickle ###
    Min: 0.449710 -> 0.458054: 1.02x slower
    Avg: 0.451589 -> 0.583100: 1.29x slower
    Significant (t=-8.04)
    Stddev: 0.00076 -> 0.11568: 152.3886x larger

    ### fastunpickle ###
    Min: 0.549672 -> 0.545734: 1.01x faster
    Avg: 0.551284 -> 0.696920: 1.26x slower
    Significant (t=-6.65)
    Stddev: 0.00118 -> 0.15493: 130.9694x larger

    ### json_load ###
    Min: 0.432500 -> 0.466602: 1.08x slower
    Avg: 0.433771 -> 0.471205: 1.09x slower
    Significant (t=-10.63)
    Stddev: 0.00495 -> 0.02440: 4.9303x larger

    ### nbody ###
    Min: 0.230055 -> 0.234925: 1.02x slower
    Avg: 0.230985 -> 0.236441: 1.02x slower
    Significant (t=-7.56)
    Stddev: 0.00048 -> 0.00508: 10.5095x larger

    The following not significant results are hidden, use -v to show them:
    chameleon_v2, django_v3, json_dump_v2, regex_v8, tornado_http.

    @brettcannon
    Copy link
    Member

    So min is the fastest time in a benchmark execution, average is the average across all benchmark executions, and the t-value is some statistics thing. Anything marked insignificant had such a small average difference it isn't worth reporting.

    If you want to smooth out the numbers you should do a rigorous run that uses more loops per benchmark to help smooth out outliers. And if you are doing this on Linux there is a flag to measure memory usage as well.

    @vstinner vstinner changed the title Add dict.__version__ read-only property PEP 509: Add ma_version to PyDictObject Jan 11, 2016
    @vstinner
    Copy link
    Member Author

    The PEP is now at python.org: PEP-509.

    @vstinner
    Copy link
    Member Author

    Patch version 3 updated for the second version of the PEP-509. Main changes:

    • ma_version now has the type PY_UINT64_T
    • remove dict.__version__ property: replace with _testcapi.dict_get_version() for tests

    @vstinner
    Copy link
    Member Author

    Patch version 4: I forgot to include my patch to update test_sys. Now the full Python test suite pass.

    @vstinner
    Copy link
    Member Author

    I'm not a big fan of pybench (it looks unstable and so not reliable), but here are results with dict_version-4.patch.

    I used more loops and a lower warp factor to get more reliable tests (I hope):

    ./python Tools/pybench/pybench.py -f pybench.orig -w 2 -C 100 -n 25
    ./python.patched Tools/pybench/pybench.py -f pybench.dictver -w 2 -C 100 -n 25
    ./python Tools/pybench/pybench.py -s pybench.dictver -c pybench.orig

    -------------------------------------------------------------------------------
    PYBENCH 2.1
    -------------------------------------------------------------------------------

    • using CPython 3.6.0a0 (default:77d24f51effc+, Jan 13 2016, 11:27:53) [GCC 5.3.1 20151207 (Red Hat 5.3.1-2)]
    • disabled garbage collection
    • system check interval set to maximum: 2147483647
    • using timer: time.perf_counter
    • timer: resolution=1e-09, implementation=clock_gettime(CLOCK_MONOTONIC)

    -------------------------------------------------------------------------------
    Benchmark: pybench.dictver
    -------------------------------------------------------------------------------

    Rounds: 25
    Warp:   2
    Timer:  time.perf_counter
    
    Machine Details:
       Platform ID:    Linux-4.2.8-300.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three
       Processor:      x86_64
    
    Python:
       Implementation: CPython
       Executable:     /home/haypo/prog/python/default/python
       Version:        3.6.0a0
       Compiler:       GCC 5.3.1 20151207 (Red Hat 5.3.1-2)
       Bits:           64bit
       Build:          Jan 13 2016 11:27:53 (#default:77d24f51effc+)
       Unicode:        UCS4
    

    -------------------------------------------------------------------------------
    Comparing with: pybench.orig
    -------------------------------------------------------------------------------

    Rounds: 25
    Warp:   2
    Timer:  time.perf_counter
    
    Machine Details:
       Platform ID:    Linux-4.2.8-300.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three
       Processor:      x86_64
    
    Python:
       Implementation: CPython
       Executable:     /home/haypo/prog/python/default/python
       Version:        3.6.0a0
       Compiler:       GCC 5.3.1 20151207 (Red Hat 5.3.1-2)
       Bits:           64bit
       Build:          Jan 13 2016 11:14:50 (#default:77d24f51effc)
       Unicode:        UCS4
    

    Test minimum run-time average run-time
    this other diff this other diff
    -------------------------------------------------------------------------------
    BuiltinFunctionCalls: 230ms 229ms +0.4% 241ms 230ms +5.1%
    BuiltinMethodLookup: 130ms 132ms -1.2% 135ms 134ms +0.4%
    CompareFloats: 147ms 149ms -1.4% 151ms 149ms +1.2%
    CompareFloatsIntegers: 330ms 333ms -0.8% 347ms 335ms +3.6%
    CompareIntegers: 214ms 209ms +2.5% 223ms 209ms +6.5%
    CompareInternedStrings: 160ms 145ms +10.8% 170ms 145ms +16.9%
    CompareLongs: 121ms 120ms +0.2% 124ms 120ms +3.4%
    CompareStrings: 132ms 131ms +0.7% 138ms 132ms +4.8%
    ComplexPythonFunctionCalls: 233ms 235ms -0.7% 241ms 238ms +1.1%
    ConcatStrings: 166ms 165ms +0.3% 177ms 167ms +6.0%
    CreateInstances: 240ms 247ms -3.1% 253ms 249ms +1.5%
    CreateNewInstances: 178ms 186ms -4.2% 188ms 188ms +0.1%
    CreateStringsWithConcat: 315ms 316ms -0.5% 331ms 318ms +4.3%
    DictCreation: 254ms 236ms +7.8% 262ms 237ms +10.5%
    DictWithFloatKeys: 211ms 199ms +6.1% 219ms 201ms +8.9%
    DictWithIntegerKeys: 171ms 163ms +5.4% 180ms 166ms +9.0%
    DictWithStringKeys: 163ms 142ms +14.5% 170ms 144ms +17.7%
    ForLoops: 121ms 121ms -0.3% 125ms 124ms +0.9%
    IfThenElse: 179ms 178ms +0.7% 185ms 178ms +3.6%
    ListSlicing: 194ms 193ms +0.4% 198ms 194ms +2.2%
    NestedForLoops: 212ms 210ms +1.2% 220ms 210ms +4.6%
    NestedListComprehensions: 205ms 212ms -3.3% 218ms 215ms +1.5%
    NormalClassAttribute: 429ms 407ms +5.5% 446ms 408ms +9.3%
    NormalInstanceAttribute: 212ms 206ms +2.8% 226ms 209ms +8.0%
    PythonFunctionCalls: 208ms 210ms -1.4% 215ms 212ms +1.2%
    PythonMethodCalls: 275ms 253ms +8.7% 293ms 255ms +14.9%
    Recursion: 333ms 328ms +1.4% 366ms 329ms +11.0%
    SecondImport: 190ms 188ms +0.8% 201ms 188ms +6.7%
    SecondPackageImport: 195ms 192ms +1.8% 214ms 192ms +11.7%
    SecondSubmoduleImport: 472ms 447ms +5.7% 502ms 455ms +10.3%
    SimpleComplexArithmetic: 141ms 138ms +2.1% 146ms 139ms +5.2%
    SimpleDictManipulation: 287ms 288ms -0.3% 313ms 290ms +8.0%
    SimpleFloatArithmetic: 138ms 137ms +0.7% 151ms 140ms +7.9%
    SimpleIntFloatArithmetic: 172ms 180ms -4.3% 182ms 181ms +0.4%
    SimpleIntegerArithmetic: 172ms 169ms +1.9% 184ms 170ms +8.1%
    SimpleListComprehensions: 166ms 167ms -0.7% 179ms 173ms +3.3%
    SimpleListManipulation: 159ms 153ms +3.9% 166ms 155ms +7.6%
    SimpleLongArithmetic: 118ms 117ms +1.2% 123ms 118ms +4.3%
    SmallLists: 210ms 204ms +2.8% 221ms 205ms +7.3%
    SmallTuples: 244ms 252ms -3.1% 255ms 256ms -0.2%
    SpecialClassAttribute: 424ms 416ms +1.8% 442ms 419ms +5.5%
    SpecialInstanceAttribute: 214ms 211ms +1.3% 223ms 214ms +4.5%
    StringMappings: 442ms 446ms -0.8% 460ms 447ms +2.9%
    StringPredicates: 259ms 258ms +0.2% 281ms 260ms +8.2%
    StringSlicing: 218ms 220ms -1.2% 228ms 221ms +3.3%
    TryExcept: 130ms 135ms -3.9% 134ms 136ms -1.4%
    TryFinally: 173ms 171ms +0.9% 180ms 172ms +4.4%
    TryRaiseExcept: 62ms 63ms -1.2% 63ms 63ms +0.9%
    TupleSlicing: 223ms 224ms -0.3% 251ms 232ms +8.0%
    WithFinally: 263ms 264ms -0.4% 280ms 265ms +5.4%
    WithRaiseExcept: 205ms 203ms +0.7% 215ms 204ms +5.5%
    -------------------------------------------------------------------------------
    Totals: 11040ms 10900ms +1.3% 11635ms 10991ms +5.9%

    (this=pybench.dictver, other=pybench.orig)

    @vstinner
    Copy link
    Member Author

    guard_benchmark.patch: patch adding a _testcapi.guard_benchmark(), a microbenchmark on dictionary guard. The benchmark measures the cost of checking if a dictionary key was modified.

    Run the benchmark with attached guard_benchmark.py. Result on my PC:

    • PyObject_GetItem(): 10.2 ns
    • PyDict_GetItem(): 9.1 ns
    • guard->check(): 2.9 ns

    python3 -m platform:
    Linux-4.2.8-300.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three

    You have to modify manually _testcapi.c to choose between the 3 implementations.

    guard_benchmark.patch requires the issue bpo-26098 patch and the fat module which implements fat.GuardDict. The fat module can be found at:
    https://github.com/haypo/fat

    @vstinner
    Copy link
    Member Author

    Patch version 5: a global counter is now used to set ma_version field of dictionaries. The global counter is incremented each time that a dictionary is created and each time that a dictionary is modified.

    A dictionary version is now unique: two dictionaries cannot have the same version. So if a guard stores a version, the check on the version will fail if a different dictionary is used.

    @1st1
    Copy link
    Member

    1st1 commented Jan 22, 2016

    Patch version 5: a global counter is now used to set ma_version field of dictionaries. The global counter is incremented each time that a dictionary is created and each time that a dictionary is modified.

    This is great, thank you, Victor.

    @vstinner
    Copy link
    Member Author

    This is great, thank you, Victor.

    I will update the PEP-509 later for the global counter.

    @vstinner
    Copy link
    Member Author

    Patch version 6: remove now unused function _testcapi.dict_set_version(). I also moved tests to test_pep509.py to make it more explicit that the implementation of the PEP-509 is not part the Python dictionary type specification, other Python implementations can choose to not implement it.

    @vstinner
    Copy link
    Member Author

    The implementation is outdated: ma_version was renamed to ma_version_tag in the latest version of the PEP-509.

    @vstinner
    Copy link
    Member Author

    The implementation is outdated: ma_version was renamed to ma_version_tag in the latest version of the PEP-509.

    Patch version 7 is updated to the latest PEP (rebased and rename ma_version to ma_version_tag).

    @vstinner
    Copy link
    Member Author

    Patch version 8:

    • Update to the latest PEP: remove micro-optimization in dictobject.c if the new value is identical to the current value, dict.__setitem__() now always changes the version
    • Refactor test_pep509 to address Brett's comments
    • Add new unit tests on identical values
    • Add new unit tests on equal values (with special __eq__ method)

    @vstinner
    Copy link
    Member Author

    timeit microbenchmarks on dict_version-8.patch, minimum of 10 runs.

    $ ./python.orig -m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()'
    1000000 loops, best of 3: 0.292 usec per loop
    $ ./python.version -m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()'
    1000000 loops, best of 3: 0.293 usec per loop

    => 1 nanosecond (0.3%) slower

    $ ./python.orig -m timeit 'd={i:i for i in range(2**16)}' 'for i in range(2**16): d[i]=i-1' 'for i in range(2**16): d[i]=i+1' 'for i in range(2**15): del d[i]' 'd.clear()'
    10 loops, best of 3: 21.2 msec per loop
    $ ./python.version -m timeit 'd={i:i for i in range(2**16)}' 'for i in range(2**16): d[i]=i-1' 'for i in range(2**16): d[i]=i+1' 'for i in range(2**15): del d[i]' 'd.clear()'
    10 loops, best of 3: 21.3 msec per loop

    => 0.1 ms (0.5%) slower

    @vstinner
    Copy link
    Member Author

    I ran again timeit microbenchmarks with CPU isolation on dict_version-8.patch, minimum of 10 runs.

    -m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()'

    • Original: 287 ns
    • Version: 289 ns (+2 ns, +0.7%)

    -m timeit 'd={i:i for i in range(2**16)}' 'for i in range(2**16): d[i]=i-1' 'for i in range(2**16): d[i]=i+1' 'for i in range(2**15): del d[i]' 'd.clear()'

    • Original: 21.2 msec
    • Version: 21.4 msec (+0.2 ms, +0.9%)

    @vstinner
    Copy link
    Member Author

    pybench results on dict_version-8.patch with CPU isolation: http://haypo-notes.readthedocs.org/microbenchmark.html

    As usual, I'm very skeptical on the pybench results which almost look like noise. I don't understand how my change can make any operation *faster*, whereas some benchmarks are faster with the patch...

    Dict microbenchmarks:

           DictCreation:    38ms    36ms   +4.8%    39ms    37ms   +3.9%
      DictWithFloatKeys:    40ms    40ms   -0.8%    40ms    40ms   -0.4%
    DictWithIntegerKeys:    33ms    31ms   +7.2%    33ms    31ms   +7.6%
     DictWithStringKeys:    29ms    28ms   +0.4%    29ms    29ms   +0.7%
    

    SimpleDictManipulation: 59ms 59ms -0.4% 59ms 59ms -0.4%

    Full output:

    $ ./python.version Tools/pybench/pybench.py -f pybench.version
    $ ./python.orig Tools/pybench/pybench.py -f pybench.orig
    $ ./python.orig Tools/pybench/pybench.py -s pybench.version -c pybench.orig 

    PYBENCH 2.1
    -------------------------------------------------------------------------------

    • using CPython 3.6.0a0 (default:e281a57d5b29, Apr 19 2016, 12:30:36) [GCC 5.3.1 20151207 (Red Hat 5.3.1-2)]
    • disabled garbage collection
    • system check interval set to maximum: 2147483647
    • using timer: time.perf_counter
    • timer: resolution=1e-09, implementation=clock_gettime(CLOCK_MONOTONIC)

    -------------------------------------------------------------------------------
    Benchmark: pybench.version
    -------------------------------------------------------------------------------

    Rounds: 10
    Warp:   10
    Timer:  time.perf_counter
    
    Machine Details:
       Platform ID:    Linux-4.4.4-301.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three
       Processor:      x86_64
    
    Python:
       Implementation: CPython
       Executable:     /home/haypo/prog/python/default/python.version
       Version:        3.6.0a0
       Compiler:       GCC 5.3.1 20151207 (Red Hat 5.3.1-2)
       Bits:           64bit
       Build:          Apr 19 2016 12:29:16 (#default:e281a57d5b29+)
       Unicode:        UCS4
    

    -------------------------------------------------------------------------------
    Comparing with: pybench.orig
    -------------------------------------------------------------------------------

    Rounds: 10
    Warp:   10
    Timer:  time.perf_counter
    
    Machine Details:
       Platform ID:    Linux-4.4.4-301.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three
       Processor:      x86_64
    
    Python:
       Implementation: CPython
       Executable:     /home/haypo/prog/python/default/python.orig
       Version:        3.6.0a0
       Compiler:       GCC 5.3.1 20151207 (Red Hat 5.3.1-2)
       Bits:           64bit
       Build:          Apr 19 2016 12:30:36 (#default:e281a57d5b29)
       Unicode:        UCS4
    

    Test minimum run-time average run-time
    this other diff this other diff
    -------------------------------------------------------------------------------
    BuiltinFunctionCalls: 49ms 50ms -1.3% 49ms 50ms -1.2%
    BuiltinMethodLookup: 26ms 26ms -0.8% 26ms 27ms -1.2%
    CompareFloats: 29ms 30ms -2.0% 29ms 30ms -1.9%
    CompareFloatsIntegers: 39ms 39ms +1.4% 39ms 39ms +1.5%
    CompareIntegers: 43ms 43ms -1.1% 43ms 43ms -1.1%
    CompareInternedStrings: 28ms 28ms -0.2% 28ms 28ms -0.3%
    CompareLongs: 25ms 25ms +2.8% 25ms 25ms +2.9%
    CompareStrings: 26ms 27ms -0.8% 26ms 27ms -1.6%
    ComplexPythonFunctionCalls: 44ms 44ms -1.6% 44ms 45ms -1.6%
    ConcatStrings: 35ms 33ms +6.4% 35ms 33ms +6.1%
    CreateInstances: 49ms 48ms +2.6% 50ms 49ms +1.8%
    CreateNewInstances: 37ms 36ms +2.5% 37ms 36ms +2.2%
    CreateStringsWithConcat: 65ms 63ms +3.3% 66ms 64ms +2.9%
    DictCreation: 38ms 36ms +4.8% 39ms 37ms +3.9%
    DictWithFloatKeys: 40ms 40ms -0.8% 40ms 40ms -0.4%
    DictWithIntegerKeys: 33ms 31ms +7.2% 33ms 31ms +7.6%
    DictWithStringKeys: 29ms 28ms +0.4% 29ms 29ms +0.7%
    ForLoops: 25ms 25ms -0.4% 26ms 26ms -0.3%
    IfThenElse: 37ms 35ms +3.3% 37ms 36ms +3.0%
    ListSlicing: 39ms 38ms +0.3% 39ms 39ms +0.0%
    NestedForLoops: 40ms 40ms +0.1% 40ms 40ms -0.0%
    NestedListComprehensions: 41ms 42ms -0.2% 42ms 42ms -0.9%
    NormalClassAttribute: 82ms 78ms +4.0% 82ms 79ms +3.8%
    NormalInstanceAttribute: 43ms 42ms +1.9% 44ms 43ms +1.8%
    PythonFunctionCalls: 40ms 42ms -5.0% 41ms 43ms -4.8%
    PythonMethodCalls: 51ms 52ms -1.1% 51ms 52ms -0.8%
    Recursion: 69ms 72ms -4.8% 69ms 73ms -4.7%
    SecondImport: 37ms 38ms -0.1% 37ms 38ms -0.1%
    SecondPackageImport: 40ms 39ms +1.9% 40ms 39ms +1.6%
    SecondSubmoduleImport: 96ms 97ms -1.0% 96ms 97ms -1.1%
    SimpleComplexArithmetic: 28ms 27ms +1.7% 29ms 28ms +1.9%
    SimpleDictManipulation: 59ms 59ms -0.4% 59ms 59ms -0.4%
    SimpleFloatArithmetic: 27ms 27ms +2.2% 27ms 27ms +2.3%
    SimpleIntFloatArithmetic: 31ms 31ms +1.6% 31ms 31ms +1.7%
    SimpleIntegerArithmetic: 31ms 31ms +0.8% 31ms 31ms +0.5%
    SimpleListComprehensions: 33ms 33ms +0.2% 33ms 33ms +0.4%
    SimpleListManipulation: 31ms 31ms -0.0% 32ms 32ms -1.5%
    SimpleLongArithmetic: 21ms 22ms -2.5% 22ms 22ms -1.8%
    SmallLists: 43ms 44ms -3.3% 43ms 44ms -3.5%
    SmallTuples: 51ms 53ms -2.9% 52ms 54ms -3.6%
    SpecialClassAttribute: 80ms 81ms -1.0% 80ms 81ms -1.1%
    SpecialInstanceAttribute: 42ms 42ms -0.5% 42ms 42ms -0.7%
    StringMappings: 87ms 86ms +0.8% 87ms 86ms +0.9%
    StringPredicates: 53ms 53ms +0.3% 53ms 53ms +0.2%
    StringSlicing: 48ms 49ms -1.3% 48ms 49ms -0.8%
    TryExcept: 25ms 25ms +0.6% 25ms 25ms +0.7%
    TryFinally: 35ms 35ms +1.1% 36ms 35ms +0.7%
    TryRaiseExcept: 13ms 13ms -0.5% 13ms 13ms -0.6%
    TupleSlicing: 46ms 46ms +0.2% 48ms 50ms -3.2%
    WithFinally: 54ms 53ms +0.8% 54ms 54ms +0.7%
    WithRaiseExcept: 44ms 43ms +2.8% 44ms 43ms +2.7%
    -------------------------------------------------------------------------------
    Totals: 2156ms 2150ms +0.3% 2171ms 2169ms +0.1%

    (this=pybench.version, other=pybench.orig)

    @malemburg
    Copy link
    Member

    On 19.04.2016 12:52, STINNER Victor wrote:

    As usual, I'm very skeptical on the pybench results which almost look like noise. I don't understand how my change can make any operation *faster*, whereas some benchmarks are faster with the patch...

    This can easily happen as a result of different memory layout, but is
    very much dependent on the machine architecture, CPU, memory type, etc.

    Dict microbenchmarks:

           DictCreation:    38ms    36ms   +4.8%    39ms    37ms   +3.9%
      DictWithFloatKeys:    40ms    40ms   -0.8%    40ms    40ms   -0.4%
    DictWithIntegerKeys:    33ms    31ms   +7.2%    33ms    31ms   +7.6%
     DictWithStringKeys:    29ms    28ms   +0.4%    29ms    29ms   +0.7%
    

    SimpleDictManipulation: 59ms 59ms -0.4% 59ms 59ms -0.4%

    Only dict creation and the integer keys benchmark results are relevant.

    Could you perhaps check what's causing these slowdowns ?

    @vstinner
    Copy link
    Member Author

    Could you perhaps check what's causing these slowdowns ?

    It's obvious, no? My patch causes the slowdown.

    On a timeit microbenchmark, I don't see such slowdown. That's also why
    I suspect that pybench is unstable.

    python3.6 -m timeit '{}' says 105 ns with and without the patch.

    python3.6 -m timeit 'd={}; d[1]=1; d[2]=2; d[3]=3; d[4]=4; d[5]=5;
    d[6]=6; d[7]=7; d[8]=8; d[9]=9; d[10]=10' says 838 ns with and without
    the patch.

    I have to "cheat": I run timeit enough times until I see the "minimum".

    @malemburg
    Copy link
    Member

    On 19.04.2016 13:11, STINNER Victor wrote:

    STINNER Victor added the comment:

    > Could you perhaps check what's causing these slowdowns ?

    It's obvious, no? My patch causes the slowdown.

    Well, yes, of course :-) I meant whether there's anything you
    can do about those slowdowns.

    On a timeit microbenchmark, I don't see such slowdown. That's also why
    I suspect that pybench is unstable.

    python3.6 -m timeit '{}' says 105 ns with and without the patch.

    python3.6 -m timeit 'd={}; d[1]=1; d[2]=2; d[3]=3; d[4]=4; d[5]=5;
    d[6]=6; d[7]=7; d[8]=8; d[9]=9; d[10]=10' says 838 ns with and without
    the patch.

    I have to "cheat": I run timeit enough times until I see the "minimum".

    Those operations are too fast for timeit. The overhead associated
    with looping is much larger than the time it takes to run the
    operation itself. That's why in pybench I put the operations into
    blocks of repeated statements. The interpreter then doesn't spend
    time on branching when going from one statement execution to the
    next (inside those blocks) and you get closer to the real runtime
    of the operation you're trying to measure.

    @vstinner
    Copy link
    Member Author

    Results of the CPython benchmark suite on dict_version-8.patch.

    IMHO regex_v8 can be ignored, this benchmark is unstable (issue bpo-26275).

    Original python: ../pep509/python
    3.6.0a0 (default:3a9b47b062b9, Apr 19 2016, 16:23:15)
    [GCC 5.3.1 20151207 (Red Hat 5.3.1-2)]

    Patched python: ../pep509/python
    3.6.0a0 (default:96f61aab2c6e, Apr 19 2016, 15:21:15)
    [GCC 5.3.1 20151207 (Red Hat 5.3.1-2)]

    $ python3 -u perf.py --rigorous -b all ../default.copy/python ../pep509/python

    INFO:root:Automatically selected timer: perf_counter
    (...)
    Report on Linux smithers 4.4.4-301.fc23.x86_64 #1 SMP Fri Mar 4 17:42:42 UTC 2016 x86_64 x86_64
    Total CPU cores: 8

    ### 2to3 ###
    Min: 6.714927 -> 6.856992: 1.02x slower
    Avg: 6.726773 -> 6.984359: 1.04x slower
    Significant (t=-5.16)
    Stddev: 0.01160 -> 0.11111: 9.5816x larger

    ### call_method ###
    Min: 0.312567 -> 0.323260: 1.03x slower
    Avg: 0.313399 -> 0.323821: 1.03x slower
    Significant (t=-284.62)
    Stddev: 0.00042 -> 0.00048: 1.1430x larger

    ### call_simple ###
    Min: 0.242091 -> 0.228922: 1.06x faster
    Avg: 0.243613 -> 0.229294: 1.06x faster
    Significant (t=2134.23)
    Stddev: 0.00010 -> 0.00005: 1.9124x smaller

    ### etree_generate ###
    Min: 0.255097 -> 0.265558: 1.04x slower
    Avg: 0.256947 -> 0.267478: 1.04x slower
    Significant (t=-67.89)
    Stddev: 0.00111 -> 0.00108: 1.0250x smaller

    ### etree_parse ###
    Min: 0.281726 -> 0.290976: 1.03x slower
    Avg: 0.283626 -> 0.292642: 1.03x slower
    Significant (t=-62.59)
    Stddev: 0.00109 -> 0.00094: 1.1540x smaller

    ### etree_process ###
    Min: 0.217164 -> 0.229552: 1.06x slower
    Avg: 0.219401 -> 0.231242: 1.05x slower
    Significant (t=-75.05)
    Stddev: 0.00112 -> 0.00111: 1.0053x smaller

    ### fannkuch ###
    Min: 1.012875 -> 0.985196: 1.03x faster
    Avg: 1.014760 -> 0.992104: 1.02x faster
    Significant (t=46.44)
    Stddev: 0.00287 -> 0.00395: 1.3778x larger

    ### float ###
    Min: 0.259226 -> 0.251918: 1.03x faster
    Avg: 0.266530 -> 0.260176: 1.02x faster
    Significant (t=10.69)
    Stddev: 0.00403 -> 0.00437: 1.0836x larger

    ### json_load ###
    Min: 0.430602 -> 0.433593: 1.01x slower
    Avg: 0.431278 -> 0.486924: 1.13x slower
    Significant (t=-24.86)
    Stddev: 0.00045 -> 0.02238: 49.8974x larger

    ### normal_startup ###
    Min: 0.319092 -> 0.326082: 1.02x slower
    Avg: 0.320093 -> 0.326842: 1.02x slower
    Significant (t=-70.52)
    Stddev: 0.00067 -> 0.00069: 1.0283x larger

    ### regex_effbot ###
    Min: 0.048969 -> 0.048313: 1.01x faster
    Avg: 0.050019 -> 0.048415: 1.03x faster
    Significant (t=31.41)
    Stddev: 0.00051 -> 0.00006: 8.8152x smaller

    ### regex_v8 ###
    Min: 0.051040 -> 0.043412: 1.18x faster
    Avg: 0.051325 -> 0.043577: 1.18x faster
    Significant (t=45.60)
    Stddev: 0.00120 -> 0.00120: 1.0013x larger

    ### richards ###
    Min: 0.157625 -> 0.161126: 1.02x slower
    Avg: 0.158952 -> 0.162519: 1.02x slower
    Significant (t=-34.92)
    Stddev: 0.00073 -> 0.00071: 1.0351x smaller

    ### silent_logging ###
    Min: 0.070547 -> 0.069902: 1.01x faster
    Avg: 0.072246 -> 0.069934: 1.03x faster
    Significant (t=65.51)
    Stddev: 0.00035 -> 0.00003: 11.9828x smaller

    The following not significant results are hidden, use -v to show them:
    call_method_slots, call_method_unknown, chameleon_v2, chaos, django_v3, etree_iterparse, fastpickle, fastunpickle, formatted_logging, go, hexiom2, json_dump_v2, mako_v2, meteor_contest, nbody, nqueens, pathlib, pickle_dict, pickle_list, pidigits, raytrace, regex_compile, simple_logging, spectral_norm, startup_nosite, telco, tornado_http, unpack_sequence, unpickle_list.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 8, 2016

    New changeset d43f819caea7 by Victor Stinner in branch 'default':
    Add a new private version to the builtin dict type
    https://hg.python.org/cpython/rev/d43f819caea7

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 8, 2016

    The PEP-509 has been approved by Guido, I just pushed the implementation.

    @vstinner vstinner closed this as completed Sep 8, 2016
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 10, 2016

    New changeset c3776dd858f0 by Victor Stinner in branch 'default':
    Try to fix sizeof unit tests on dict
    https://hg.python.org/cpython/rev/c3776dd858f0

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants