Issue39737
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2020-02-24 04:56 by Dennis Sweeney, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 18638 | merged | Dennis Sweeney, 2020-02-24 04:57 |
Messages (12) | |||
---|---|---|---|
msg362566 - (view) | Author: Dennis Sweeney (Dennis Sweeney) * | Date: 2020-02-24 04:56 | |
The following tiny change: diff --git a/Objects/listobject.c b/Objects/listobject.c index 3c39c6444b..3ac03b71d0 100644 --- a/Objects/listobject.c +++ b/Objects/listobject.c @@ -2643,8 +2643,7 @@ list_richcompare(PyObject *v, PyObject *w, int op) Py_INCREF(vitem); Py_INCREF(witem); - int k = PyObject_RichCompareBool(vl->ob_item[i], - wl->ob_item[i], Py_EQ); + int k = PyObject_RichCompareBool(vitem, witem, Py_EQ); Py_DECREF(vitem); Py_DECREF(witem); if (k < 0) Creates the following performance improvement: Before: > .\python.bat -m timeit -s "A = list(range(10**7)); B = list(range(10**7))" "A==B" 2 loops, best of 5: 134 msec per loop > .\python.bat -m timeit -s "A = list(range(10**7)); B = list(range(10**7))" "A==B" 2 loops, best of 5: 134 msec per loop After: > .\python.bat -m timeit -s "A = list(range(10**7)); B = list(range(10**7))" "A==B" 2 loops, best of 5: 126 msec per loop > .\python.bat -m timeit -s "A = list(range(10**7)); B = list(range(10**7))" "A==B" 2 loops, best of 5: 126 msec per loop |
|||
msg362646 - (view) | Author: Dong-hee Na (corona10) * | Date: 2020-02-25 15:08 | |
IMHO, I can not see a noticeable performance improvement. I think that the modern compiler will optimize the reused variable. ;) The below result might be noise. [master] ./python.exe -m pyperf timeit "A = list(range(10**3)); B = list(range(10**3))" "A==B" ..................... Mean +- std dev: 415 us +- 6 us [PR 18638] ./python.exe -m pyperf timeit "A = list(range(10**3)); B = list(range(10**3))" "A==B" ..................... Mean +- std dev: 433 us +- 20 us |
|||
msg362654 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2020-02-25 18:16 | |
I have doubts that such small change can lead to significant speed up. It is likely a compiler glitch. But this change makes the code clearer. If you remove a NEWS entry with a doubtful promise I will accept it. |
|||
msg362665 - (view) | Author: Dennis Sweeney (Dennis Sweeney) * | Date: 2020-02-25 20:30 | |
I made the requested changes to reflect that this is for code cleanliness rather than strictly for performance. However, it appears that Visual Studio on Windows 10 was not doing the optimization one might expect. In particular, here is the disassembly before this PR: =========================================== PyObject *vitem = vl->ob_item[i]; 78E20C49 mov eax,dword ptr [edx+0Ch] PyObject *vitem = vl->ob_item[i]; 78E20C4C mov ecx,dword ptr [eax+edi*4] PyObject *witem = wl->ob_item[i]; 78E20C4F mov eax,dword ptr [esi+0Ch] 78E20C52 mov dword ptr [vitem],ecx 78E20C55 mov esi,dword ptr [eax+edi*4] 78E20C58 mov dword ptr [witem],esi if (vitem == witem) { 78E20C5B cmp ecx,esi 78E20C5D je list_richcompare+12Bh (78E20CFBh) continue; } Py_INCREF(vitem); 78E20C63 inc dword ptr [ecx] Py_INCREF(witem); 78E20C65 inc dword ptr [esi] int k = PyObject_RichCompareBool(vl->ob_item[i], 78E20C67 mov eax,dword ptr [w] 78E20C6A mov eax,dword ptr [eax+0Ch] 78E20C6D mov ebx,dword ptr [eax+edi*4] 78E20C70 mov eax,dword ptr [edx+0Ch] 78E20C73 mov eax,dword ptr [eax+edi*4] 78E20C76 cmp eax,ebx 78E20C78 jne list_richcompare+0B1h (78E20C81h) 78E20C7A mov ebx,1 78E20C7F jmp list_richcompare+100h (78E20CD0h) 78E20C81 push 2 78E20C83 push ebx 78E20C84 push eax 78E20C85 call PyObject_RichCompare (78E35120h) 78E20C8A mov esi,eax 78E20C8C add esp,0Ch 78E20C8F test esi,esi 78E20C91 jne list_richcompare+0C8h (78E20C98h) 78E20C93 or ebx,0FFFFFFFFh 78E20C96 jmp list_richcompare+0FAh (78E20CCAh) 78E20C98 cmp dword ptr [esi+4],offset PyBool_Type (790A3420h) =========================================== And after this PR: =========================================== PyObject *vitem = vl->ob_item[i]; 795E0C4A mov eax,dword ptr [edx+0Ch] PyObject *vitem = vl->ob_item[i]; 795E0C4D mov ebx,dword ptr [eax+ecx*4] PyObject *witem = wl->ob_item[i]; 795E0C50 mov eax,dword ptr [edi+0Ch] 795E0C53 mov edi,dword ptr [eax+ecx*4] if (vitem == witem) { 795E0C56 cmp ebx,edi 795E0C58 je list_richcompare+109h (795E0CD9h) continue; } Py_INCREF(vitem); 795E0C5A inc dword ptr [ebx] Py_INCREF(witem); 795E0C5C inc dword ptr [edi] int k = PyObject_RichCompareBool(vitem, witem, Py_EQ); 795E0C5E push 2 795E0C60 push edi 795E0C61 push ebx 795E0C62 call PyObject_RichCompare (795F5120h) 795E0C67 mov esi,eax 795E0C69 add esp,0Ch 795E0C6C test esi,esi 795E0C6E jne list_richcompare+0A5h (795E0C75h) 795E0C70 or esi,0FFFFFFFFh 795E0C73 jmp list_richcompare+0DBh (795E0CABh) 795E0C75 cmp dword ptr [esi+4],offset PyBool_Type (79863420h) 795E0C7C jne list_richcompare+0BBh (795E0C8Bh) 795E0C7E xor eax,eax 795E0C80 cmp esi,offset _Py_TrueStruct (798633FCh) 795E0C86 sete al 795E0C89 jmp list_richcompare+0C4h (795E0C94h) 795E0C8B push esi 795E0C8C call PyObject_IsTrue (795F62F0h) 795E0C91 add esp,4 795E0C94 add dword ptr [esi],0FFFFFFFFh 795E0C97 mov dword ptr [k],eax 795E0C9A jne list_richcompare+0D8h (795E0CA8h) =========================================== |
|||
msg362676 - (view) | Author: Dong-hee Na (corona10) * | Date: 2020-02-26 02:07 | |
> it appears that Visual Studio on Windows 10 was not doing the optimization one might expect. Hmm, Is this build on release mode? |
|||
msg362678 - (view) | Author: Dennis Sweeney (Dennis Sweeney) * | Date: 2020-02-26 02:50 | |
> Hmm, Is this build on release mode? Yes--in debug mode, each Py_INCREF is these 8 instructions: ======================== 78BD071E mov ecx,dword ptr [_Py_RefTotal (79039700h)] 78BD0724 add ecx,1 78BD0727 mov dword ptr [_Py_RefTotal (79039700h)],ecx 78BD072D mov edx,dword ptr [ebp-10h] 78BD0730 mov eax,dword ptr [edx] 78BD0732 add eax,1 78BD0735 mov ecx,dword ptr [ebp-10h] 78BD0738 mov dword ptr [ecx],eax ======================== instead of just the one `inc` in release mode. |
|||
msg362679 - (view) | Author: Dong-hee Na (corona10) * | Date: 2020-02-26 03:09 | |
Debug mode is not meaningful. Visual Studio will optimize fully on release mode. |
|||
msg362680 - (view) | Author: Dennis Sweeney (Dennis Sweeney) * | Date: 2020-02-26 03:57 | |
> Debug mode is not meaningful. > Visual Studio will optimize fully on release mode. Sorry if I wasn't clear--the original assembly difference I posted in (https://bugs.python.org/msg362665) was indeed using the "release" build configuration. My last comment was trying to demonstrate this fact by showing that the assembly output would have looked different if I had selected the "debug" build configuration. Is there something I'm missing, or was Visual Studio just missing an opportunity to optimize? |
|||
msg362681 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2020-02-26 04:42 | |
Speed-up or not, I concur that the new code it clearer. |
|||
msg362683 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2020-02-26 07:00 | |
New changeset be7ead62db9a1db3e2cd997b0beffd4480e51f5c by sweeneyde in branch 'master': bpo-39737: Remove code repitition in list_richcompare (GH-18638) https://github.com/python/cpython/commit/be7ead62db9a1db3e2cd997b0beffd4480e51f5c |
|||
msg362684 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2020-02-26 07:01 | |
Thank you for your contribution Dennis. |
|||
msg363193 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-03-02 17:39 | |
I suggest to use LTO + PGO optimizations when benchmarking Python: https://pyperformance.readthedocs.io/usage.html#how-to-get-stable-benchmarks Benchmarking is hard :-/ |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:27 | admin | set | github: 83918 |
2020-03-02 17:39:23 | vstinner | set | messages: + msg363193 |
2020-02-26 07:01:45 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages: + msg362684 stage: patch review -> resolved |
2020-02-26 07:00:39 | serhiy.storchaka | set | messages: + msg362683 |
2020-02-26 04:42:48 | rhettinger | set | nosy:
+ rhettinger messages: + msg362681 |
2020-02-26 03:57:32 | Dennis Sweeney | set | messages: + msg362680 |
2020-02-26 03:09:47 | corona10 | set | messages: + msg362679 |
2020-02-26 02:50:23 | Dennis Sweeney | set | messages: + msg362678 |
2020-02-26 02:07:31 | corona10 | set | messages: + msg362676 |
2020-02-25 20:30:06 | Dennis Sweeney | set | messages: + msg362665 |
2020-02-25 18:16:51 | serhiy.storchaka | set | messages: + msg362654 |
2020-02-25 15:08:23 | corona10 | set | messages: + msg362646 |
2020-02-24 16:06:00 | SilentGhost | set | nosy:
+ vstinner, serhiy.storchaka, corona10 |
2020-02-24 04:57:41 | Dennis Sweeney | set | keywords:
+ patch stage: patch review pull_requests: + pull_request18004 |
2020-02-24 04:56:41 | Dennis Sweeney | create |