Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dict creation performance regression #60669

Closed
phihag mannequin opened this issue Nov 13, 2012 · 23 comments
Closed

dict creation performance regression #60669

phihag mannequin opened this issue Nov 13, 2012 · 23 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@phihag
Copy link
Mannequin

phihag mannequin commented Nov 13, 2012

BPO 16465
Nosy @tim-one, @rhettinger, @jcea, @pitrou, @vstinner, @tiran, @benjaminp, @asvetlov, @serhiy-storchaka, @MojoVampire
Superseder
  • bpo-23601: use small object allocator for dict key storage
  • Files
  • dict_free_key_list.patch: Patch for using the free list for keys
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2016-03-11.13:02:04.158>
    created_at = <Date 2012-11-13.16:19:04.141>
    labels = ['interpreter-core', 'performance']
    title = 'dict creation performance regression'
    updated_at = <Date 2016-03-11.15:33:52.236>
    user = 'https://bugs.python.org/phihag'

    bugs.python.org fields:

    activity = <Date 2016-03-11.15:33:52.236>
    actor = 'vstinner'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2016-03-11.13:02:04.158>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2012-11-13.16:19:04.141>
    creator = 'phihag'
    dependencies = []
    files = ['27983']
    hgrepos = []
    issue_num = 16465
    keywords = ['patch', '3.3regression']
    message_count = 23.0
    messages = ['175503', '175504', '175507', '175511', '175528', '175531', '175546', '175551', '175561', '175563', '175567', '175581', '175583', '175586', '175615', '183211', '205362', '205363', '205397', '224937', '224939', '261565', '261576']
    nosy_count = 13.0
    nosy_names = ['tim.peters', 'rhettinger', 'jcea', 'pitrou', 'vstinner', 'christian.heimes', 'benjamin.peterson', 'phihag', 'asvetlov', 'pingebretson', 'jcon', 'serhiy.storchaka', 'josh.r']
    pr_nums = []
    priority = 'normal'
    resolution = 'out of date'
    stage = 'resolved'
    status = 'closed'
    superseder = '23601'
    type = 'performance'
    url = 'https://bugs.python.org/issue16465'
    versions = ['Python 3.5']

    @phihag
    Copy link
    Mannequin Author

    phihag mannequin commented Nov 13, 2012

    On my system, {} has become significantly slower in 3.3:

    $ python3.2 -m timeit -n 1000000 '{}'
    1000000 loops, best of 3: 0.0314 usec per loop
    $ python3.3 -m timeit -n 1000000 '{}'
    1000000 loops, best of 3: 0.0892 usec per loop
    $ hg id -i
    ee7b713fec71+
    $ ./python -m timeit -n 1000000 '{}'
    1000000 loops, best of 3: 0.0976 usec per loop

    Is this because of the dict randomization?

    @phihag phihag mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Nov 13, 2012
    @serhiy-storchaka
    Copy link
    Member

    I confirm that.

    $ ./python -m timeit -n 1000000 '{};{};{};{};{};{};{};{};{};{}'

    2.6: 0.62 usec
    2.7: 0.57 usec
    3.1: 0.55 usec
    3.2: 0.48 usec
    3.3: 1.5 usec

    Randomization is not affecting it.

    @serhiy-storchaka
    Copy link
    Member

    Ah, this is an effect of PEP-412. The difference exists only for creating dicts up to 5 items (inclusive).

    @serhiy-storchaka
    Copy link
    Member

    May be using the free list for keys will restore performance.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 13, 2012

    Does this regression impact any real-world program?

    @rhettinger
    Copy link
    Contributor

    Does this regression impact any real-world program?

    That is a blow-off response. A huge swath of the language is affected by dictionary performance (keyword args, module lookups, attribute lookup, etc). Most programs will be affected to some degree -- computationally bound programs will notice more and i/o bound programs won't notice at all.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 14, 2012

    > Does this regression impact any real-world program?

    That is a blow-off response. A huge swath of the language is affected
    by dictionary performance (keyword args, module lookups, attribute
    lookup, etc).

    I was merely suggesting to report actual (non-micro) benchmark numbers.
    This issue is about dict creation, not dict lookups.

    @serhiy-storchaka
    Copy link
    Member

    As I understand, the new dict created on every call of function with keyword arguments. This slow down every such call about 0.1 µsec. This is about 10% of int('42', base=16). In the sum, some programs can slow down to a few percents.

    Direct comparison of 3.2 and 3.3 is meaningless because of the influence of other factors (first of all PEP-393).

    @pitrou
    Copy link
    Member

    pitrou commented Nov 14, 2012

    As I understand, the new dict created on every call of function with
    keyword arguments. This slow down every such call about 0.1 µsec.
    This is about 10% of int('42', base=16).

    Ok, but int('42', base=16) is about the fastest function call with
    keyword arguments one can think about :-) In other words, the overhead
    will probably not be noticeable for most function calls.

    @serhiy-storchaka
    Copy link
    Member

    Ok, but int('42', base=16) is about the fastest function call with
    keyword arguments one can think about :-)

    Not as fast as a call without keywords, int('42', 16). :-( But this is a different issue.

    @serhiy-storchaka
    Copy link
    Member

    Here is a really simple patch. It speed up to 1.9x an empty dict creation (still 1.6x slower than 3.2). Make your measurements of real-world programs.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 14, 2012

    Ok, here are some benchmark results with your patch:

    ### call_method ###
    Min: 0.313816 -> 0.305150: 1.03x faster
    Avg: 0.316292 -> 0.305678: 1.03x faster
    Significant (t=21.07)
    Stddev: 0.00188 -> 0.00050: 3.7398x smaller
    Timeline: http://tinyurl.com/couzwbe

    ### call_method_unknown ###
    Min: 0.323125 -> 0.312034: 1.04x faster
    Avg: 0.325697 -> 0.313115: 1.04x faster
    Significant (t=17.62)
    Stddev: 0.00258 -> 0.00099: 2.6059x smaller
    Timeline: http://tinyurl.com/clwacj3

    ### call_simple ###
    Min: 0.215008 -> 0.220773: 1.03x slower
    Avg: 0.215666 -> 0.221657: 1.03x slower
    Significant (t=-23.28)
    Stddev: 0.00062 -> 0.00078: 1.2721x larger
    Timeline: http://tinyurl.com/cv6gxft

    ### fastpickle ###
    Min: 0.554353 -> 0.569768: 1.03x slower
    Avg: 0.558192 -> 0.574705: 1.03x slower
    Significant (t=-6.90)
    Stddev: 0.00363 -> 0.00393: 1.0829x larger
    Timeline: http://tinyurl.com/c8s2dqk

    ### float ###
    Min: 0.292259 -> 0.279563: 1.05x faster
    Avg: 0.296011 -> 0.286154: 1.03x faster
    Significant (t=2.99)
    Stddev: 0.00417 -> 0.00609: 1.4585x larger
    Timeline: http://tinyurl.com/c34ofe5

    ### iterative_count ###
    Min: 0.115091 -> 0.118527: 1.03x slower
    Avg: 0.116174 -> 0.119266: 1.03x slower
    Significant (t=-6.82)
    Stddev: 0.00068 -> 0.00075: 1.1148x larger
    Timeline: http://tinyurl.com/bsmuuso

    ### json_dump_v2 ###
    Min: 2.591838 -> 2.652666: 1.02x slower
    Avg: 2.599335 -> 2.687225: 1.03x slower
    Significant (t=-8.09)
    Stddev: 0.00715 -> 0.02322: 3.2450x larger
    Timeline: http://tinyurl.com/clxutkt

    ### json_load ###
    Min: 0.378329 -> 0.370132: 1.02x faster
    Avg: 0.379840 -> 0.371222: 1.02x faster
    Significant (t=13.68)
    Stddev: 0.00112 -> 0.00085: 1.3237x smaller
    Timeline: http://tinyurl.com/c7ycte2

    ### nbody ###
    Min: 0.259945 -> 0.251294: 1.03x faster
    Avg: 0.261186 -> 0.251843: 1.04x faster
    Significant (t=14.95)
    Stddev: 0.00125 -> 0.00062: 2.0274x smaller
    Timeline: http://tinyurl.com/cg8z6yg

    ### nqueens ###
    Min: 0.295304 -> 0.275767: 1.07x faster
    Avg: 0.298382 -> 0.278839: 1.07x faster
    Significant (t=10.19)
    Stddev: 0.00244 -> 0.00353: 1.4429x larger
    Timeline: http://tinyurl.com/bo2dn4x

    ### pathlib ###
    Min: 0.100494 -> 0.102371: 1.02x slower
    Avg: 0.101787 -> 0.104876: 1.03x slower
    Significant (t=-8.15)
    Stddev: 0.00103 -> 0.00159: 1.5432x larger
    Timeline: http://tinyurl.com/cb9w79t

    ### richards ###
    Min: 0.166939 -> 0.172828: 1.04x slower
    Avg: 0.167945 -> 0.174199: 1.04x slower
    Significant (t=-9.65)
    Stddev: 0.00092 -> 0.00112: 1.2241x larger
    Timeline: http://tinyurl.com/cdros5x

    ### silent_logging ###
    Min: 0.070485 -> 0.066857: 1.05x faster
    Avg: 0.070538 -> 0.067041: 1.05x faster
    Significant (t=44.64)
    Stddev: 0.00003 -> 0.00017: 5.4036x larger
    Timeline: http://tinyurl.com/buz5h9j

    ### 2to3 ###
    0.490925 -> 0.478927: 1.03x faster

    ### chameleon ###
    Min: 0.050694 -> 0.049364: 1.03x faster
    Avg: 0.051030 -> 0.049759: 1.03x faster
    Significant (t=8.45)
    Stddev: 0.00041 -> 0.00041: 1.0036x larger
    Timeline: b'http://tinyurl.com/d5czdjt'

    ### mako ###
    Min: 0.183295 -> 0.177191: 1.03x faster
    Avg: 0.184944 -> 0.179300: 1.03x faster
    Significant (t=15.54)
    Stddev: 0.00119 -> 0.00137: 1.1555x larger
    Timeline: b'http://tinyurl.com/dx6xrvx'

    In the end, it's mostly a wash.

    @jcea
    Copy link
    Member

    jcea commented Nov 14, 2012

    Antoine, I would consider this a performance regression to solve for 3.3.1. Small dictionary creation is everywhere in CPython.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 14, 2012

    Antoine, I would consider this a performance regression to solve for
    3.3.1. Small dictionary creation is everywhere in CPython.

    Again, feel free to provide real-world benchmark numbers proving the
    regression. I've already posted some figures.

    @serhiy-storchaka
    Copy link
    Member

    In the end, it's mostly a wash.

    Yes, it's predictable. The gain is too small and undistinguished on a background of random noise. May be more long-running benchmark will show more reliable results, but in any case, the gain is small.

    My long-running hard-optimized simulation is about 5% faster on 3.2 or patched 3.4 comparing to 3.3. But this is not a typical program.

    @tiran
    Copy link
    Member

    tiran commented Feb 28, 2013

    The patch gives a measurable speedup (./python is a patched 3.3.0+). IMO we should apply it. It's small and I can see no harm, too.

    $ for PY in python2.7 python3.2 python3.3 ./python; do cmd="$PY -R -m timeit -n 10000000 '{};{};{};{};{};{};{};{};{};{}'"; echo $cmd; eval $cmd; done
    python2.7 -R -m timeit -n 10000000 '{};{};{};{};{};{};{};{};{};{}'
    10000000 loops, best of 3: 0.162 usec per loop
    python3.2 -R -m timeit -n 10000000 '{};{};{};{};{};{};{};{};{};{}'
    10000000 loops, best of 3: 0.142 usec per loop
    python3.3 -R -m timeit -n 10000000 '{};{};{};{};{};{};{};{};{};{}'
    10000000 loops, best of 3: 0.669 usec per loop
    ./python -R -m timeit -n 10000000 '{};{};{};{};{};{};{};{};{};{}'
    10000000 loops, best of 3: 0.381 usec per loop
    
    $ for PY in python2.7 python3.2 python3.3 ./python; do cmd="$PY -R -m timeit -n 10000000 'int(\"1\", base=16)'"; echo $cmd; eval $cmd; done
    python2.7 -R -m timeit -n 10000000 'int("1", base=16)'
    10000000 loops, best of 3: 0.268 usec per loop
    python3.2 -R -m timeit -n 10000000 'int("1", base=16)'
    10000000 loops, best of 3: 0.302 usec per loop
    python3.3 -R -m timeit -n 10000000 'int("1", base=16)'
    10000000 loops, best of 3: 0.477 usec per loop
    ./python -R -m timeit -n 10000000 'int("1", base=16)'
    10000000 loops, best of 3: 0.356 usec per loop

    @serhiy-storchaka
    Copy link
    Member

    So what we should do with this?

    @pitrou
    Copy link
    Member

    pitrou commented Dec 6, 2013

    You said it yourself: """The gain is too small and undistinguished on a background of random noise""". Closing would be reasonable, unless someone exhibits a reasonable benchmark where there is a significant difference.

    In any case, it's too late for perf improvements on 3.4.

    @rhettinger
    Copy link
    Contributor

    This patch looks correct and like the right thing to do. I recommend applying it to 3.5.

    @serhiy-storchaka
    Copy link
    Member

    Antoine, are you still oppose to this patch?

    @serhiy-storchaka serhiy-storchaka self-assigned this Aug 6, 2014
    @pitrou
    Copy link
    Member

    pitrou commented Aug 6, 2014

    The patch adds complication to the already complicated memory management of dicts. It increases maintenance burden in critical code.

    Have we found any case where it makes a tangible difference?
    (I'm not talking about timeit micro-benchmarks)

    @serhiy-storchaka
    Copy link
    Member

    Closed in the favor of bpo-23601.

    @vstinner
    Copy link
    Member

    Serhiy Storchaka:

    Closed in the favor of bpo-23601.

    Cool!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants