This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Put most likely test first in set_add_entry()
Type: Stage: patch review
Components: Interpreter Core Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: eric.snow, python-dev, r.david.murray, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-07-22 04:01 by rhettinger, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
better_test_order.diff rhettinger, 2015-07-22 04:01 Move most likely test first review
better_test_order2.diff rhettinger, 2015-07-22 05:51 And only save so->table just before the rich compare review
move_likely_first.diff rhettinger, 2015-07-23 12:06 Put most likely test first in set_add_entry() review
Messages (18)
msg247086 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-07-22 04:01
Since the *found_active* exit is like the *found_error* exit in that it makes no further use of *entry*, it can be moved before the table/entry_key check whose purpose is to make sure the *entry* pointer is still valid.  This change doesn't apply to lookkey() which makes downstream use of the entry pointer.  In constrast, set_add_entry() is fully self-contained now and only returns a 0 or -1 rather than a pointer into the set table.

This puts the most likely test case first, putting it ahead of the two memory reloads in table/entry_key check.

Also, add an "else if" to the initial freeslot check to make it match the corresponding "else if" in the linear probe loop.
msg247092 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-07-22 05:51
One other change is to defer saving table=so->table until just before the rich comparison since it is only needed in that code block.  This improves the generated code saving a register spill and reload on the most common code paths.
msg247099 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-07-22 07:29
Since it looks like an optimization, can you please provide a benchmark?
msg247109 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-07-22 11:33
You can benchmark if you want.  I'm looking for a second pair of eyes to validate the correctness.  My goal is to put the tests and assignments in the most logical order.
msg247114 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-07-22 12:41
> You can benchmark if you want.

No, you have to provide a benchmark if you want to modify the Python core to optimize it. Otherwise, modifying the code is pointless.

It's a general rule in Python to optimize code. We don't accept optimization changes if it has no effect on performances.
msg247116 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-22 13:22
The patch changes behavior. With the patch it would be possible that after successful set.add(), the item will be not contained in the set. And this behavior is not consistent with dict behavior.
msg247117 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-07-22 13:29
Victor, I'm hearing Raymond say that it isn't really about optimization, but about the logical organization of the code.  I think making things more *complicated* requires a benchmark justification, but it doesn't look to me like this change makes things more complicated (on the other hand I'm not *fluent* in C).
msg247132 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-07-22 16:47
FWIW, my approach is to look at the most important code
paths to see if there is any work being done that isn't
essential for the result being computed.

Next, I look at the generated assembly to estimate speed
by counting memory accesses (and whether they are cached
fresh accesses or stale random accesses) and I look at
the branches (and whether they are predictable or not).

The table=so->table assignment was being done for all code
paths but was only needed around the rich compare.  Here
is the before and after for the most important path
(the first lookup).  Note that the change saves one memory
spill and one reload.

Before:
-------
_set_add_entry:
pushq   %r15
pushq   %r14
movq    %rdx, %r14
pushq   %r13
pushq   %r12
movq    %rdi, %r12
pushq   %rbp
movq    %rsi, %rbp
pushq   %rbx
subq    $56, %rsp
movq    40(%rdi), %rax
addq    $1, (%rsi)
movq    %rax, 16(%rsp)        <-- spill
movq    32(%r12), %rdx        
movq    %rdx, %r15
andq    %r14, %r15
movq    %r15, %rbx
salq    $4, %rbx
addq    16(%rsp), %rbx        <-- reload
movq    (%rbx), %rcx
testq   %rcx, %rcx
je  L430


AFTER
-----
_set_add_entry:
	pushq	%r15
	movq	%rdx, %r15
	pushq	%r14
	pushq	%r13
	pushq	%r12
	movq	%rdi, %r12
	pushq	%rbp
	movq	%rsi, %rbp
	pushq	%rbx
	subq	$56, %rsp
	movq	40(%rdi), %rdx
	addq	$1, (%rsi)        <-- no spill
	movq	%rdx, %r11
L428:
	movq	32(%r12), %rcx
	movq	%rcx, %r13
	andq	%r15, %r13
	movq	%r13, %rbx
	salq	$4, %rbx
	addq	%r11, %rbx         <-- from register
	movq	(%rbx), %r14
	testq	%r14, %r14
	je	L429


The code around the rich compare used to do memory
loads that weren't necessary for the most likely case
(since the 64-bit hash values match, it is very likely
that the comparison will report a match).

BEFORE
------

call    _PyObject_RichCompareBool
movq    24(%rsp), %rcx
movq    (%rcx), %rdi
leaq    -1(%rdi), %rdx
testq   %rdx, %rdx
movq    %rdx, (%rcx)
je  L489
testl   %eax, %eax            
js  L437                      <--- predictable error branch
movq    40(%r12), %rdx        <--- memory load 
cmpq    16(%rsp), %rdx        <--- memory load
jne L460                 
cmpq    (%rbx), %rcx          <--- memory load  
jne L429                      <--- predictable restart branch
testl   %eax, %eax            <--- predictable found_active branch            
jne L432                      <--- most common exit point
movq    32(%r12), %rdx


AFTER
-----

	call	_PyObject_RichCompareBool
	movq	16(%rsp), %rcx
	movq	(%rcx), %rdi
	leaq	-1(%rdi), %rdx
	testq	%rdx, %rdx
	movq	%rdx, (%rcx)
	je	L485
	cmpl	$0, %eax
	jg	L431                  <-- common exit before the memory loads!
L490:
	jne	L434
	movq	40(%r12), %rdx    <--- memory load 
	cmpq	%rdx, 24(%rsp)    <--- memory load 
	movq	%rdx, %r11
	jne	L428
	cmpq	(%rbx), %rcx      <--- memory load 
	jne	L428
msg247135 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-07-22 17:02
> "you have to provide a benchmark"

Actually, I don't.  When making a small series of changes, benchmarking every step is waste of time and tends to trap you in local minimums and causes you to overfit to a particular processor, compiler, or benchmark.  The better process is to carefully work through what the code is telling the machine to do and evaluating whether those steps make sense.  This is done in conjunction with organizing the code in a more logical manner (i.e. only saving the so->table in the block where we're using it as a check to check if the rich comparison rearranged the table in a way the invalidated the entry pointer or made the current search invalid).  In general, less work is better, having related actions take place close together is better, making functions self-contained is better, etc.

If you want to team-up and help, your contribution is welcome.  General sniping isn't helpful at all.  I wrote all of this code and have maintained it for 13 years -- this series of refactorings has been something I've been working towards for a long time.
msg247148 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2015-07-22 19:10
Thanks for the clear explanation, Raymond.  The approach you've described is useful in a number of circumstances.  Would you mind publishing (somewhere outside the tracker; devguide?) the specific steps you take and the tools you use?
msg247155 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-07-22 23:29
For me, optimizing assembler is still an optimization. I give up, I just don't care of set performances.
msg247156 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-07-23 00:27
If PyObject_RichCompareBool() reports that a key is already present in the set, then set_add_entry() is under no further obligation to make an insertion.   As long as there is no risk of segfault, there is no more obligation to cater to lying or self-destructing __eq__ methods than there is to support objects that return randomized __hash__ values. 

The docs for set.add(...):  "Add an element to a set. This has no effect if the element is already present."  The latter condition is determined by the PyObject_RichCompareBool() check.  If it reports a match, then it is reasonable to do nothing.

FWIW, dicts don't have a choice in this regard because they still have an implementation that depends on returning a currently valid entry which they can't do if the table is mutating during the search.  The set_add_entry() implementation has the advantage in that it is self-contained and need only report -1 is an error arose during the comparison or a 0 to indicate that no error arose.  Also, note that sets diverged from dicts that the outset in that they don't swallow exceptions like PyDict_GetItem() does.
msg247178 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-23 08:55
The condition that is determined by the PyObject_RichCompareBool() check is not that the element is already present, but that the element was present. Even if we will agree that such behavior change is appropriate, the inconsistency with dict makes it doubtful (originally set was implemented with dict).

I have no objections against tweaking so->table caching. I even think this part of the patch improves the code.
msg247185 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-07-23 11:36
Sets are under no obligation to keep their code synced with dicts and have over time diverged in a number of ways.
msg247186 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-07-23 11:42
New changeset bc80c783c4ab by Raymond Hettinger in branch 'default':
Issue #24681:  Move the store of so->table to the code block where it is used.
https://hg.python.org/cpython/rev/bc80c783c4ab
msg247258 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-07-24 05:20
Serhiy, do you see anything actually wrong with the patch?  If it isn't broken in some clear cut way, I'm going to apply it shortly.  The comparison with dict internals is a red herring -- there is no promise need or precedent for making that code exactly the same (nor is there is restriction on PyPy or Jython which have already traveled down other paths not aligned with cpyhton dict internals).
msg247260 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-24 05:41
The patch makes set_add_entry() inconsistent not only with dict, but with other set methods that use set_lookkey(). I don't know if there is some subtle bug here, but I feel myself slightly uncomfortable with it. Also the new code looks a little less readable to me. Current code is straightforward translation from Python to C (first check error, then check that the result still is actual, finally check the result itself), the new code forces you to stop and think, it's correctness is not so obvious.

Even if all good with the patch, please don't commit it right now. Wait at least a week and then commit. May be someone other will find a flaw in the patch.
msg247747 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-07-31 14:59
New changeset 9e3be159d023 by Raymond Hettinger in branch 'default':
Issue #24681:  Move the most likely test first in set_add_entry().
https://hg.python.org/cpython/rev/9e3be159d023
History
Date User Action Args
2022-04-11 14:58:19adminsetgithub: 68869
2015-07-31 14:59:57rhettingersetstatus: open -> closed
resolution: fixed
2015-07-31 14:59:09python-devsetmessages: + msg247747
2015-07-25 22:30:04vstinnersetnosy: - vstinner
2015-07-24 05:41:39serhiy.storchakasetmessages: + msg247260
2015-07-24 05:20:14rhettingersetmessages: + msg247258
2015-07-23 12:06:59rhettingersetfiles: + move_likely_first.diff
2015-07-23 12:03:27rhettingersetfiles: - move_likely_first.diff
2015-07-23 11:45:47rhettingersetfiles: + move_likely_first.diff
2015-07-23 11:45:09rhettingersetassignee: serhiy.storchaka -> rhettinger
2015-07-23 11:42:43python-devsetnosy: + python-dev
messages: + msg247186
2015-07-23 11:36:37rhettingersetmessages: + msg247185
2015-07-23 08:55:32serhiy.storchakasetmessages: + msg247178
2015-07-23 00:27:42rhettingersetmessages: + msg247156
2015-07-22 23:29:01vstinnersetnosy: rhettinger, vstinner, r.david.murray, eric.snow, serhiy.storchaka
messages: + msg247155
2015-07-22 19:10:51eric.snowsetnosy: + eric.snow
messages: + msg247148
2015-07-22 17:02:26rhettingersetmessages: + msg247135
2015-07-22 16:47:34rhettingersetmessages: + msg247132
2015-07-22 13:29:23r.david.murraysetnosy: + r.david.murray
messages: + msg247117
2015-07-22 13:22:36serhiy.storchakasetmessages: + msg247116
2015-07-22 12:41:31vstinnersetmessages: + msg247114
2015-07-22 11:33:24rhettingersetmessages: + msg247109
2015-07-22 07:29:45vstinnersetnosy: + vstinner
messages: + msg247099
2015-07-22 05:51:58rhettingersetfiles: + better_test_order2.diff

messages: + msg247092
2015-07-22 05:26:40rhettingersettitle: Put most likely test first is set_add_entry() -> Put most likely test first in set_add_entry()
2015-07-22 04:01:49rhettingercreate