Message234308
First draft of patch to switch from a table[(i+j)&mask] style of entry calculation to an entry++ style. The entry computation simplifies from add/shl4/and/lea to a single add16. To do this, the linear probes are limited to the length of table rather than wrapping-around.
I haven't timed this one yet (have just looked at the assembly) or worked through the implications. The new approach is a little less attractive but may be easier to reason about than the mask wrap-around. Also, it disadvantages small sets where the wrap-around property assured that the linear probes would always fine a hit without any entry being checked more than once. There is a extra setup time before the linear probes to compute the limit. Also, since the loop size is now variable instead of fixed, the inner loop for set_insert_clean() no longer gets unrolled by either GCC or CLANG.
With the change, the inner loop of set_insert_clean() is very tight:
L436:
addq $16, %rax #, entry
entry < limit
cmpq %rdx, %rax # limit, entry
ja L399 #,
entry->key == NULL
cmpq $0, (%rax) #,* entry
jne L436
The code for lookkey() is also tight (though I can't tell why the entry->key gets loaded from memory twice):
L102:
cmpq %r15, %r13 # tmp170, D.12706 entry->key == dummy
jne L103 #,
testq %r12, %r12 # freeslot
cmove %r14, %r12 # entry -> freeslot
L103:
addq $16, %r14 #, entry entry++
cmpq %r14, %rbx # entry, limit
jb L146 #,
movq (%r14), %r13 # MEM[base: entry_65, offset: 0B], D.12706
testq %r13, %r13 # D.12706 entry->key == NULL
je L147 #,
cmpq %rbp, 8(%r14) # hash, MEM[base: entry_104, offset: 8B]
je L148 #, entry->hash == hash
? movq (%r14), %r13 # MEM[base: entry_104, offset: 0B], D.12706
jmp L102 # |
|
Date |
User |
Action |
Args |
2015-01-19 09:15:42 | rhettinger | set | recipients:
+ rhettinger |
2015-01-19 09:15:42 | rhettinger | set | messageid: <1421658942.27.0.429398283891.issue23269@psf.upfronthosting.co.za> |
2015-01-19 09:15:41 | rhettinger | link | issue23269 messages |
2015-01-19 09:15:40 | rhettinger | create | |
|