This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author rhettinger
Recipients pitrou, rhettinger, serhiy.storchaka
Date 2015-01-25.20:31:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1422217891.56.0.664510306819.issue23269@psf.upfronthosting.co.za>
In-reply-to
Content
Reopening this with a better patch that nicely tightens the inner-loop without an algorithmic change or any downside.

Currently, the entry computation adds a constant (i+j), masks it (&mask) to wrap on overflow, scales it to a table index (<< 4), and then does the key lookup (one memory access).   These steps are sequential (not parallizeable).

The patch moves the wrap-on-overflow decision out of the inner loop (adding a single, fast, predictable branch (i + LINEAR_PROBES > mask).   The inner-loop then simplifies to an add, fetch, and test (no masking and shifting).

The generated code on Clang is very tight:

LBB29_26:      
    cmpq    $0, (%rsi)       # entry->key == NULL
    je  LBB29_30
    incq    %rdx             # j++
    addq    $16, %rsi        # entry++  (done in parallel with the incq)
    cmpq    $9, %rdx         # j <= LINEAR_PROBES
    jbe LBB29_26

On gcc-4.9, the loop is unrolled and we get even better code:

    leaq    (%rbx,%rsi), %rdx   # entry = table[i]
    cmpq    $0, (%rdx)          # entry.key == NULL
    je  L223

    leaq    16(%rbx,%rsi), %rdx # entry = table[i+1];
    cmpq    $0, (%rdx)          # entry.key == NULL
    je  L223

    leaq    32(%rbx,%rsi), %rdx
    cmpq    $0, (%rdx)
    je  L223
History
Date User Action Args
2015-01-25 20:31:31rhettingersetrecipients: + rhettinger, pitrou, serhiy.storchaka
2015-01-25 20:31:31rhettingersetmessageid: <1422217891.56.0.664510306819.issue23269@psf.upfronthosting.co.za>
2015-01-25 20:31:31rhettingerlinkissue23269 messages
2015-01-25 20:31:31rhettingercreate