Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize heapify for better cache utililzation #68343

Closed
rhettinger opened this issue May 10, 2015 · 2 comments
Closed

Optimize heapify for better cache utililzation #68343

rhettinger opened this issue May 10, 2015 · 2 comments
Assignees
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage

Comments

@rhettinger
Copy link
Contributor

BPO 24155
Nosy @rhettinger
Files
  • wordy_explanation.txt: More detailed write-up with ASCII diagrams
  • better_heapify.diff: Patch for cache friendly heapify()
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2015-05-11.17:19:28.877>
    created_at = <Date 2015-05-10.04:52:28.417>
    labels = ['extension-modules', 'performance']
    title = 'Optimize heapify for better cache utililzation'
    updated_at = <Date 2015-05-11.17:19:28.876>
    user = 'https://github.com/rhettinger'

    bugs.python.org fields:

    activity = <Date 2015-05-11.17:19:28.876>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2015-05-11.17:19:28.877>
    closer = 'rhettinger'
    components = ['Extension Modules']
    creation = <Date 2015-05-10.04:52:28.417>
    creator = 'rhettinger'
    dependencies = []
    files = ['39332', '39333']
    hgrepos = []
    issue_num = 24155
    keywords = ['patch']
    message_count = 2.0
    messages = ['242850', '242903']
    nosy_count = 2.0
    nosy_names = ['rhettinger', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue24155'
    versions = ['Python 3.5']

    @rhettinger
    Copy link
    Contributor Author

    Heapify() is implemented with a series of siftup() operations that aggregate small heaps into bigger heaps.

    The current algorithm builds all the smallest heaps first, then makes all of the next largest size, and so on. This is not cache friendly because the aggregation step operates on smaller heaps built long before. The new algorithm performs the aggregation step immediately after its child heaps are built (while they are still likely to be in cache).

    The overall algorithm is the same (children built before parents). The number of comparisons is the same. And the resulting heap is identical. The only difference is performing work while the inputs are still in cache rather than waiting until all heaps at the current size have been built.

    For small heaps that already fit entirely in L1 cache, there is no benefit, so we stick with the current version which has less branching. For larger heaps, we switch to the new order.

    The timings and benefits depend on the number and sizes of the objects being heapified as well as the relative speed of L1 to L2 to L3 to DRAM.

    ------------------------------------------------------------------

    For those who are interested, here timings for heapifying shuffled lists of various sizes. The elements are tuples of length 1 that contain distinct integers.

    The first row has the time to copy to the data. It should be substracted from the timings on the next two rows which time the new algorithm versus the old algorithm.

    The benefits don't start to show up until after N is over 1000 (depending on the input type, the breakeven point seems to fall somewhere between 1200 and 2500 on my machine).

    N = 100
    [2.9262970201671124e-05, 2.9265997000038624e-05, 2.9325950890779495e-05] tupledata[:]
    [0.0006274560000747442, 0.0006340609397739172, 0.0006361680570989847] heapify(tupledata[:])
    [0.0006139189936220646, 0.0006186790997162461, 0.000632670009508729] heapify_old(tupledata[:])

    [2.8867041692137718e-05, 2.8883921913802624e-05, 2.896797377616167e-05] tupledata[:]
    [0.000608008005656302, 0.0006171419518068433, 0.0006187589606270194] heapify(tupledata[:])
    [0.0006224410608410835, 0.000638791942037642, 0.0006388520123437047] heapify_old(tupledata[:])

    [2.89019662886858e-05, 2.8969021514058113e-05, 2.8973910957574844e-05] tupledata[:]
    [0.0006031119264662266, 0.0006048450013622642, 0.0006136660231277347] heapify(tupledata[:])
    [0.000612352043390274, 0.0006144039798527956, 0.0006217029877007008] heapify_old(tupledata[:])

    N = 1000
    [0.0002854769118130207, 0.0002856890205293894, 0.00028590403962880373] tupledata[:]
    [0.006836145068518817, 0.006866019102744758, 0.006885501905344427] heapify(tupledata[:])
    [0.0067316299537196755, 0.006792359985411167, 0.0067987809889018536] heapify_old(tupledata[:])

    [0.00028532894793897867, 0.0002853329060599208, 0.00028538203332573175] tupledata[:]
    [0.006822419003583491, 0.0068415619898587465, 0.006888034055009484] heapify(tupledata[:])
    [0.006734892027452588, 0.006814536056481302, 0.0068227669689804316] heapify_old(tupledata[:])

    [0.00028527993708848953, 0.0002854960039258003, 0.0002858199877664447] tupledata[:]
    [0.006787727936170995, 0.0067988099763169885, 0.006827510078437626] heapify(tupledata[:])
    [0.0067258820636197925, 0.006815586006268859, 0.006871008081361651] heapify_old(tupledata[:])

    N = 10000
    [0.004415847011841834, 0.004417525022290647, 0.0044295149855315685] tupledata[:]
    [0.07748138904571533, 0.07753941905684769, 0.07756883592810482] heapify(tupledata[:])
    [0.08400217199232429, 0.08420385408680886, 0.08428021904546767] heapify_old(tupledata[:])

    [0.004418709082528949, 0.004422315978445113, 0.004425868042744696] tupledata[:]
    [0.07753065403085202, 0.0775474050315097, 0.07755298691336066] heapify(tupledata[:])
    [0.08406145800836384, 0.08412359503563493, 0.08419332408811897] heapify_old(tupledata[:])

    [0.0044234748929739, 0.0044267530320212245, 0.0044296300038695335] tupledata[:]
    [0.07729987089987844, 0.07750388595741242, 0.07770221296232194] heapify(tupledata[:])
    [0.08401058206800371, 0.0840839499142021, 0.08423375408165157] heapify_old(tupledata[:])

    N = 100000
    [0.055330604896880686, 0.05594596697483212, 0.056045167963020504] tupledata[:]
    [1.2075877389870584, 1.207723677973263, 1.2084980909712613] heapify(tupledata[:])
    [1.56127171497792, 1.5691186729818583, 1.575164051959291] heapify_old(tupledata[:])

    [0.0558202009415254, 0.05597207904793322, 0.0560223578941077] tupledata[:]
    [1.2101711059221998, 1.211772706010379, 1.2120026310440153] heapify(tupledata[:])
    [1.5360360990744084, 1.5435883220052347, 1.5501357419416308] heapify_old(tupledata[:])

    [0.05999936908483505, 0.06000674597453326, 0.06018067698460072] tupledata[:]
    [1.209613809012808, 1.2116600699955598, 1.2144729839637876] heapify(tupledata[:])
    [1.5371010650414973, 1.5499007020844147, 1.5706949040759355] heapify_old(tupledata[:])

    N = 1000000
    [0.8224946830887347, 0.8234598189592361, 0.8247971039963886] tupledata[:]
    [18.152570085017942, 18.340327466023155, 18.413799613015726] heapify(tupledata[:])
    [19.786154965986498, 19.91440916794818, 19.952165015041828] heapify_old(tupledata[:])

    [0.8147928019752726, 0.8154206149047241, 0.8169217950198799] tupledata[:]
    [18.227028850931674, 18.265947047038935, 18.36190685792826] heapify(tupledata[:])
    [19.587209751014598, 19.62119024794083, 19.85366743709892] heapify_old(tupledata[:])

    [0.8098425359930843, 0.8100302360253409, 0.8104055189760402] tupledata[:]
    [18.16859135404229, 18.207948053022847, 18.350174001068808] heapify(tupledata[:])
    [19.6270396419568, 19.85634774295613, 20.017801710986532] heapify_old(tupledata[:])

    N = 10000000
    [16.074914730968885, 16.084022041060962, 16.150474293972366] tupledata[:]
    [205.83624146296643, 205.94496312504634, 205.96075649105478] heapify(tupledata[:])
    [349.2653319039382, 349.8653641429264, 351.06795906298794] heapify_old(tupledata[:])

    [16.07795425108634, 16.091183452052064, 16.10310253489297] tupledata[:]
    [205.1686675120145, 206.0234369279351, 206.10121345799416] heapify(tupledata[:])
    [348.308155576, 348.6673470499227, 348.7512100059539] heapify_old(tupledata[:])

    [16.074230056954548, 16.098330755950883, 16.106685669976287] tupledata[:]
    [205.18946332705673, 205.4753301080782, 205.5993042109767] heapify(tupledata[:])
    [349.5718760070158, 349.6067797210999, 351.29123334004544] heapify_old(tupledata[:])

    @rhettinger rhettinger self-assigned this May 10, 2015
    @rhettinger rhettinger added extension-modules C modules in the Modules dir performance Performance or resource usage labels May 10, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 11, 2015

    New changeset db87591fce01 by Raymond Hettinger in branch 'default':
    Issue bpo-24155: Optimize heapify for better cache utililzation.
    https://hg.python.org/cpython/rev/db87591fce01

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant