Message236928
Python's core is full of bound checks like this one in Objects/listobject.c:
static PyObject *
list_item(PyListObject *a, Py_ssize_t i)
{
if (i < 0 || i >= Py_SIZE(a)) {
...
Abner Fog's high-level language optimization guide, http://www.agner.org/optimize/optimizing_cpp.pdf in section 14.2 Bounds Checking, shows a way to fold this into a single check:
- if (i < 0 || i >= Py_SIZE(a)) {
+ if ((unsigned)i >= (unsigned)(Py_SIZE(a))) {
if (indexerr == NULL) {
indexerr = PyUnicode_FromString(
"list index out of range");
The old generated assembly code looks like this:
_list_item:
subq $8, %rsp
testq %rsi, %rsi
js L227
cmpq 16(%rdi), %rsi
jl L228
L227:
... <error reporting and exit > ...
L228:
movq 24(%rdi), %rax
movq (%rax,%rsi,8), %rax
addq $1, (%rax)
addq $8, %rsp
ret
The new disassembly looks like this:
_list_item:
cmpl %esi, 16(%rdi)
ja L227
... <error reporting and exit > ...
L227:
movq 24(%rdi), %rax
movq (%rax,%rsi,8), %rax
addq $1, (%rax)
ret
Note, the new code not only saves a comparison/conditional-jump pair, it also avoids the need to adjust %rsp on the way in and the way out for a net savings of four instructions along the critical path.
When we have good branch prediction, the current approach is very low cost; however, Abner Fog's recommendation is never more expensive, is sometimes cheaper, saves a possible misprediction, and reduces the total code generated. All in all, it is a net win.
I recommend we put in a macro of some sort so that this optimization gets expressed exactly once in the code and so that it has a good clear name with an explanation of what it does. |
|
Date |
User |
Action |
Args |
2015-03-01 00:50:38 | rhettinger | set | recipients:
+ rhettinger |
2015-03-01 00:50:38 | rhettinger | set | messageid: <1425171038.63.0.240295916037.issue23553@psf.upfronthosting.co.za> |
2015-03-01 00:50:38 | rhettinger | link | issue23553 messages |
2015-03-01 00:50:37 | rhettinger | create | |
|