Issue 24681: Put most likely test first in set_add_entry()

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/68869

classification

Title:	Put most likely test first in set_add_entry()
Type:		Stage:	patch review
Components:	Interpreter Core	Versions:	Python 3.6

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	eric.snow, python-dev, r.david.murray, rhettinger, serhiy.storchaka
Priority:	normal	Keywords:	patch

Created on 2015-07-22 04:01 by rhettinger, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
better_test_order.diff	rhettinger, 2015-07-22 04:01	Move most likely test first	review
better_test_order2.diff	rhettinger, 2015-07-22 05:51	And only save so->table just before the rich compare	review
move_likely_first.diff	rhettinger, 2015-07-23 12:06	Put most likely test first in set_add_entry()	review

Messages (18)
msg247086 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2015-07-22 04:01
Since the found_active exit is like the found_error exit in that it makes no further use of entry, it can be moved before the table/entry_key check whose purpose is to make sure the entry pointer is still valid. This change doesn't apply to lookkey() which makes downstream use of the entry pointer. In constrast, set_add_entry() is fully self-contained now and only returns a 0 or -1 rather than a pointer into the set table. This puts the most likely test case first, putting it ahead of the two memory reloads in table/entry_key check. Also, add an "else if" to the initial freeslot check to make it match the corresponding "else if" in the linear probe loop.
msg247092 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2015-07-22 05:51
One other change is to defer saving table=so->table until just before the rich comparison since it is only needed in that code block. This improves the generated code saving a register spill and reload on the most common code paths.
msg247099 - (view)	Author: STINNER Victor (vstinner) *	Date: 2015-07-22 07:29
Since it looks like an optimization, can you please provide a benchmark?
msg247109 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2015-07-22 11:33
You can benchmark if you want. I'm looking for a second pair of eyes to validate the correctness. My goal is to put the tests and assignments in the most logical order.
msg247114 - (view)	Author: STINNER Victor (vstinner) *	Date: 2015-07-22 12:41
> You can benchmark if you want. No, you have to provide a benchmark if you want to modify the Python core to optimize it. Otherwise, modifying the code is pointless. It's a general rule in Python to optimize code. We don't accept optimization changes if it has no effect on performances.
msg247116 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-07-22 13:22
The patch changes behavior. With the patch it would be possible that after successful set.add(), the item will be not contained in the set. And this behavior is not consistent with dict behavior.
msg247117 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-07-22 13:29
Victor, I'm hearing Raymond say that it isn't really about optimization, but about the logical organization of the code. I think making things more complicated requires a benchmark justification, but it doesn't look to me like this change makes things more complicated (on the other hand I'm not fluent in C).
msg247132 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2015-07-22 16:47
FWIW, my approach is to look at the most important code paths to see if there is any work being done that isn't essential for the result being computed. Next, I look at the generated assembly to estimate speed by counting memory accesses (and whether they are cached fresh accesses or stale random accesses) and I look at the branches (and whether they are predictable or not). The table=so->table assignment was being done for all code paths but was only needed around the rich compare. Here is the before and after for the most important path (the first lookup). Note that the change saves one memory spill and one reload. Before: ------- _set_add_entry: pushq %r15 pushq %r14 movq %rdx, %r14 pushq %r13 pushq %r12 movq %rdi, %r12 pushq %rbp movq %rsi, %rbp pushq %rbx subq $56, %rsp movq 40(%rdi), %rax addq $1, (%rsi) movq %rax, 16(%rsp) <-- spill movq 32(%r12), %rdx movq %rdx, %r15 andq %r14, %r15 movq %r15, %rbx salq $4, %rbx addq 16(%rsp), %rbx <-- reload movq (%rbx), %rcx testq %rcx, %rcx je L430 AFTER ----- _set_add_entry: pushq %r15 movq %rdx, %r15 pushq %r14 pushq %r13 pushq %r12 movq %rdi, %r12 pushq %rbp movq %rsi, %rbp pushq %rbx subq $56, %rsp movq 40(%rdi), %rdx addq $1, (%rsi) <-- no spill movq %rdx, %r11 L428: movq 32(%r12), %rcx movq %rcx, %r13 andq %r15, %r13 movq %r13, %rbx salq $4, %rbx addq %r11, %rbx <-- from register movq (%rbx), %r14 testq %r14, %r14 je L429 The code around the rich compare used to do memory loads that weren't necessary for the most likely case (since the 64-bit hash values match, it is very likely that the comparison will report a match). BEFORE ------ call _PyObject_RichCompareBool movq 24(%rsp), %rcx movq (%rcx), %rdi leaq -1(%rdi), %rdx testq %rdx, %rdx movq %rdx, (%rcx) je L489 testl %eax, %eax js L437 <--- predictable error branch movq 40(%r12), %rdx <--- memory load cmpq 16(%rsp), %rdx <--- memory load jne L460 cmpq (%rbx), %rcx <--- memory load jne L429 <--- predictable restart branch testl %eax, %eax <--- predictable found_active branch jne L432 <--- most common exit point movq 32(%r12), %rdx AFTER ----- call _PyObject_RichCompareBool movq 16(%rsp), %rcx movq (%rcx), %rdi leaq -1(%rdi), %rdx testq %rdx, %rdx movq %rdx, (%rcx) je L485 cmpl $0, %eax jg L431 <-- common exit before the memory loads! L490: jne L434 movq 40(%r12), %rdx <--- memory load cmpq %rdx, 24(%rsp) <--- memory load movq %rdx, %r11 jne L428 cmpq (%rbx), %rcx <--- memory load jne L428
msg247135 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2015-07-22 17:02
> "you have to provide a benchmark" Actually, I don't. When making a small series of changes, benchmarking every step is waste of time and tends to trap you in local minimums and causes you to overfit to a particular processor, compiler, or benchmark. The better process is to carefully work through what the code is telling the machine to do and evaluating whether those steps make sense. This is done in conjunction with organizing the code in a more logical manner (i.e. only saving the so->table in the block where we're using it as a check to check if the rich comparison rearranged the table in a way the invalidated the entry pointer or made the current search invalid). In general, less work is better, having related actions take place close together is better, making functions self-contained is better, etc. If you want to team-up and help, your contribution is welcome. General sniping isn't helpful at all. I wrote all of this code and have maintained it for 13 years -- this series of refactorings has been something I've been working towards for a long time.
msg247148 - (view)	Author: Eric Snow (eric.snow) *	Date: 2015-07-22 19:10
Thanks for the clear explanation, Raymond. The approach you've described is useful in a number of circumstances. Would you mind publishing (somewhere outside the tracker; devguide?) the specific steps you take and the tools you use?
msg247155 - (view)	Author: STINNER Victor (vstinner) *	Date: 2015-07-22 23:29
For me, optimizing assembler is still an optimization. I give up, I just don't care of set performances.
msg247156 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2015-07-23 00:27
If PyObject_RichCompareBool() reports that a key is already present in the set, then set_add_entry() is under no further obligation to make an insertion. As long as there is no risk of segfault, there is no more obligation to cater to lying or self-destructing __eq__ methods than there is to support objects that return randomized __hash__ values. The docs for set.add(...): "Add an element to a set. This has no effect if the element is already present." The latter condition is determined by the PyObject_RichCompareBool() check. If it reports a match, then it is reasonable to do nothing. FWIW, dicts don't have a choice in this regard because they still have an implementation that depends on returning a currently valid entry which they can't do if the table is mutating during the search. The set_add_entry() implementation has the advantage in that it is self-contained and need only report -1 is an error arose during the comparison or a 0 to indicate that no error arose. Also, note that sets diverged from dicts that the outset in that they don't swallow exceptions like PyDict_GetItem() does.
msg247178 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-07-23 08:55
The condition that is determined by the PyObject_RichCompareBool() check is not that the element is already present, but that the element was present. Even if we will agree that such behavior change is appropriate, the inconsistency with dict makes it doubtful (originally set was implemented with dict). I have no objections against tweaking so->table caching. I even think this part of the patch improves the code.
msg247185 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2015-07-23 11:36
Sets are under no obligation to keep their code synced with dicts and have over time diverged in a number of ways.
msg247186 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-07-23 11:42
New changeset bc80c783c4ab by Raymond Hettinger in branch 'default': Issue #24681: Move the store of so->table to the code block where it is used. https://hg.python.org/cpython/rev/bc80c783c4ab
msg247258 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2015-07-24 05:20
Serhiy, do you see anything actually wrong with the patch? If it isn't broken in some clear cut way, I'm going to apply it shortly. The comparison with dict internals is a red herring -- there is no promise need or precedent for making that code exactly the same (nor is there is restriction on PyPy or Jython which have already traveled down other paths not aligned with cpyhton dict internals).
msg247260 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2015-07-24 05:41
The patch makes set_add_entry() inconsistent not only with dict, but with other set methods that use set_lookkey(). I don't know if there is some subtle bug here, but I feel myself slightly uncomfortable with it. Also the new code looks a little less readable to me. Current code is straightforward translation from Python to C (first check error, then check that the result still is actual, finally check the result itself), the new code forces you to stop and think, it's correctness is not so obvious. Even if all good with the patch, please don't commit it right now. Wait at least a week and then commit. May be someone other will find a flaw in the patch.
msg247747 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-07-31 14:59
New changeset 9e3be159d023 by Raymond Hettinger in branch 'default': Issue #24681: Move the most likely test first in set_add_entry(). https://hg.python.org/cpython/rev/9e3be159d023

History
Date	User	Action	Args
2022-04-11 14:58:19	admin	set	github: 68869
2015-07-31 14:59:57	rhettinger	set	status: open -> closed resolution: fixed
2015-07-31 14:59:09	python-dev	set	messages: + msg247747
2015-07-25 22:30:04	vstinner	set	nosy: - vstinner
2015-07-24 05:41:39	serhiy.storchaka	set	messages: + msg247260
2015-07-24 05:20:14	rhettinger	set	messages: + msg247258
2015-07-23 12:06:59	rhettinger	set	files: + move_likely_first.diff
2015-07-23 12:03:27	rhettinger	set	files: - move_likely_first.diff
2015-07-23 11:45:47	rhettinger	set	files: + move_likely_first.diff
2015-07-23 11:45:09	rhettinger	set	assignee: serhiy.storchaka -> rhettinger
2015-07-23 11:42:43	python-dev	set	nosy: + python-dev messages: + msg247186
2015-07-23 11:36:37	rhettinger	set	messages: + msg247185
2015-07-23 08:55:32	serhiy.storchaka	set	messages: + msg247178
2015-07-23 00:27:42	rhettinger	set	messages: + msg247156
2015-07-22 23:29:01	vstinner	set	nosy: rhettinger, vstinner, r.david.murray, eric.snow, serhiy.storchaka messages: + msg247155
2015-07-22 19:10:51	eric.snow	set	nosy: + eric.snow messages: + msg247148
2015-07-22 17:02:26	rhettinger	set	messages: + msg247135
2015-07-22 16:47:34	rhettinger	set	messages: + msg247132
2015-07-22 13:29:23	r.david.murray	set	nosy: + r.david.murray messages: + msg247117
2015-07-22 13:22:36	serhiy.storchaka	set	messages: + msg247116
2015-07-22 12:41:31	vstinner	set	messages: + msg247114
2015-07-22 11:33:24	rhettinger	set	messages: + msg247109
2015-07-22 07:29:45	vstinner	set	nosy: + vstinner messages: + msg247099
2015-07-22 05:51:58	rhettinger	set	files: + better_test_order2.diff messages: + msg247092
2015-07-22 05:26:40	rhettinger	set	title: Put most likely test first is set_add_entry() -> Put most likely test first in set_add_entry()
2015-07-22 04:01:49	rhettinger	create