Issue2179
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008-02-24 23:10 by jyasskin, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
faster_with.patch | jyasskin, 2008-03-03 00:29 | proof of concept | ||
faster_with.patch | jyasskin, 2008-03-03 02:34 | complete |
Messages (10) | |||
---|---|---|---|
msg62951 - (view) | Author: Jeffrey Yasskin (jyasskin) * | Date: 2008-02-24 23:10 | |
Currently, 'with' costs about .2us over try/finally: $ ./python.exe -m timeit -s 'import thread; lock = thread.allocate_lock()' 'lock.acquire()' 'try: pass' 'finally: lock.release()' 1000000 loops, best of 3: 0.617 usec per loop $ ./python.exe -m timeit -s 'import thread; lock = thread.allocate_lock()' 'with lock: pass' 1000000 loops, best of 3: 0.774 usec per loop Since it's doing the same thing (and calling the same C functions to do the lock manipulation), it shouldn't take more time. The bytecodes associated with the two options look like: 2) with lock: 3) pass 2 0 LOAD_GLOBAL 0 (lock) 3 DUP_TOP 4 LOAD_ATTR 1 (__exit__) 7 STORE_FAST 0 (_[1]) 10 LOAD_ATTR 2 (__enter__) 13 CALL_FUNCTION 0 16 POP_TOP 17 SETUP_FINALLY 4 (to 24) 3 20 POP_BLOCK 21 LOAD_CONST 0 (None) >> 24 LOAD_FAST 0 (_[1]) 27 DELETE_FAST 0 (_[1]) 30 WITH_CLEANUP 31 END_FINALLY 32 LOAD_CONST 0 (None) 35 RETURN_VALUE 6) lock.acquire() 7) try: 8) pass 9) finally: 10) lock.release() 6 0 LOAD_GLOBAL 0 (lock) 3 LOAD_ATTR 1 (acquire) 6 CALL_FUNCTION 0 9 POP_TOP 7 10 SETUP_FINALLY 4 (to 17) 8 13 POP_BLOCK 14 LOAD_CONST 0 (None) 10 >> 17 LOAD_GLOBAL 0 (lock) 20 LOAD_ATTR 2 (release) 23 CALL_FUNCTION 0 26 POP_TOP 27 END_FINALLY 28 LOAD_CONST 0 (None) 31 RETURN_VALUE The major difference I see is the extra local variable (_[1]) used by with. It looks like the compiler ought to be able to save that on the stack instead, and save 3 opcodes. Neal Norwitz suggested that, if that optimization is impossible, WITH_CLEANUP could be modified to take the variable as a parameter, which would let it absorb the LOAD_FAST and DELETE_FAST instructions. I've added everyone on the previous bug to the nosy list. Sorry if you don't care. :) |
|||
msg63165 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2008-03-01 16:24 | |
A closer approximation of what the with statement is doing would be: exit = lock.release() lock.acquire() try: pass finally: exit() The problem with trying to store the result of the retrieval of __exit__ on the stack is that we need to leave the context manager itself on top of the stack for the next LOAD_ATTR opcode (when we're retrieving the __enter__ method). However, changing WITH_CLEANUP to take an argument indicating which local variable holds the bound __exit__ method sounds like it might work. |
|||
msg63167 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2008-03-01 16:34 | |
Scratch the parentheses on that first line of sample code in my previous comment (isn't copy and paste wonderful?) |
|||
msg63197 - (view) | Author: Jeffrey Yasskin (jyasskin) * | Date: 2008-03-03 00:29 | |
Here's a proof-of-concept patch that keeps the __exit__ method on the stack. It uses ROT_TWO to stuff it under the context object instead of storing it into a temporary. (Thanks Nick for pointing out that problem before I had to waste time on it.) test_with passes, although I need to update several more things and maybe fix a refleak. The patch changes the compilation of: def with_(l): with l: pass from 4 0 LOAD_FAST 0 (l) 3 DUP_TOP 4 LOAD_ATTR 0 (__exit__) 7 STORE_FAST 1 (_[1]) 10 LOAD_ATTR 1 (__enter__) 13 CALL_FUNCTION 0 16 POP_TOP 17 SETUP_FINALLY 4 (to 24) 5 20 POP_BLOCK 21 LOAD_CONST 0 (None) >> 24 LOAD_FAST 1 (_[1]) 27 DELETE_FAST 1 (_[1]) 30 WITH_CLEANUP 31 END_FINALLY 32 LOAD_CONST 0 (None) 35 RETURN_VALUE to 4 0 LOAD_FAST 0 (l) 3 DUP_TOP 4 LOAD_ATTR 0 (__exit__) 7 ROT_TWO 8 LOAD_ATTR 1 (__enter__) 11 CALL_FUNCTION 0 14 POP_TOP 15 SETUP_FINALLY 4 (to 22) 5 18 POP_BLOCK 19 LOAD_CONST 0 (None) >> 22 WITH_CLEANUP 23 END_FINALLY 24 LOAD_CONST 0 (None) 27 RETURN_VALUE And speeds it up from: $ ./python.exe -m timeit -s 'import thread; lock = thread.allocate_lock()' 'with lock: pass' 1000000 loops, best of 3: 0.832 usec per loop to: $ ./python.exe -m timeit -s 'import thread; lock = thread.allocate_lock()' 'with lock: pass' 1000000 loops, best of 3: 0.762 usec per loop That's only half of the way to parity with try/finally: $ ./python.exe -m timeit -s 'import thread; lock = thread.allocate_lock()' 'lock.acquire()' 'try: pass' 'finally: lock.release()' 1000000 loops, best of 3: 0.638 usec per loop What's strange is that calling __enter__ and __exit__ in a try/finally block brings the speed back to the faster 'with' speed, even though they call the same C functions: $ ./python.exe -m timeit -s 'import thread; lock = thread.allocate_lock()' 'lock.__enter__()' 'try: pass' 'finally: lock.__exit__()' 1000000 loops, best of 3: 0.754 usec per loop Any ideas? |
|||
msg63200 - (view) | Author: Jeffrey Yasskin (jyasskin) * | Date: 2008-03-03 02:34 | |
Now with documentation, a working test_compile, and one less refleak. |
|||
msg63323 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * | Date: 2008-03-06 15:53 | |
> What's strange is that calling __enter__ and __exit__ in a > try/finally block brings the speed back to the faster 'with' speed, > even though they call the same C functions Looking carefully at the code, there are two reasons for this: - LockType has no methods! try "dir(thread.LockType)". Instead, LockType defines a tp_getattr which does a *linear* search in a PyMethodDef array. First items are served faster... - After converting this to use the usual tp_methods slot, there is still a performance difference, due to the fact that release() is a METH_NOARGS method, while __exit__() uses METH_VARARGS. Phew. |
|||
msg63353 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2008-03-07 12:01 | |
Patch applied cleanly for me and all tests pass. It also looked good on a visual scan over the diff text. |
|||
msg63355 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2008-03-07 14:17 | |
I went ahead and committed the change to the bytecode generation as r61290. The deficiencies in the lock implementation should probably be raised as a separate issue, so I'm closing this one. |
|||
msg63356 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * | Date: 2008-03-07 14:43 | |
Hm, my tests do not see any speedup with this patch. I used VS2005 on win2K, and VS2008 on winXP. Timings are very similar before and after this patch. Maybe the optimization is only useful with gcc? |
|||
msg63425 - (view) | Author: Jeffrey Yasskin (jyasskin) * | Date: 2008-03-09 17:03 | |
Thanks Nick and Amaury! Amaury, what times are you seeing? It could be a just-gcc speedup, but I wouldn't have thought so since the patch eliminates 2 times around the eval loop. It could be that the overhead of WITH_CLEANUP is high enough to drown out those iterations. You had mentioned optimizing the PyObject_CallFunctionObjArgs() call before. If you're still seeing with significantly slower than try, that's probably the right place to look. Here are my current timings. To avoid the lock issues, I wrote simple_cm.py containing: class CM(object): def __enter__(self): pass def __exit__(self, *args): pass $ ./python.exe -m timeit -s 'import simple_cm; cm = simple_cm.CM()' 'with cm: pass' 1000000 loops, best of 3: 0.885 usec per loop $ ./python.exe -m timeit -s 'import simple_cm; cm = simple_cm.CM()' 'cm.__enter__()' 'try: pass' 'finally: cm.__exit__()' 1000000 loops, best of 3: 0.858 usec per loop If __exit__ doesn't contain *args (making it not a context manager), the try/finally time goes down to: $ ./python.exe -m timeit -s 'import simple_cm; cm = simple_cm.CM()' 'cm.__enter__()' 'try: pass' 'finally: cm.__exit__()' 1000000 loops, best of 3: 0.619 usec per loop I think in theory, with could be slightly faster than finally with the same argument list, but it's pretty close now. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:31 | admin | set | github: 46432 |
2008-03-09 17:03:40 | jyasskin | set | messages: + msg63425 |
2008-03-08 00:08:03 | jcea | set | nosy: + jcea |
2008-03-07 14:43:02 | amaury.forgeotdarc | set | messages: + msg63356 |
2008-03-07 14:17:51 | ncoghlan | set | status: open -> closed resolution: accepted messages: + msg63355 |
2008-03-07 12:01:48 | ncoghlan | set | messages: + msg63353 |
2008-03-06 15:54:00 | amaury.forgeotdarc | set | messages: + msg63323 |
2008-03-03 15:27:57 | belopolsky | set | nosy: + belopolsky |
2008-03-03 02:34:12 | jyasskin | set | files:
+ faster_with.patch messages: + msg63200 |
2008-03-03 00:29:46 | jyasskin | set | files:
+ faster_with.patch keywords: + patch messages: + msg63197 |
2008-03-01 16:34:03 | ncoghlan | set | messages: + msg63167 |
2008-03-01 16:24:17 | ncoghlan | set | nosy:
+ ncoghlan messages: + msg63165 |
2008-02-25 00:56:30 | collinwinter | set | nosy: + collinwinter |
2008-02-24 23:10:40 | jyasskin | create |