classification
Title: _thread.LockType: Optimize lock deletion, acquisition of uncontested lock and release of lock.
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: kristjan.jonsson, pitrou, python-dev, sbt
Priority: normal Keywords: patch

Created on 2012-06-21 14:11 by kristjan.jonsson, last changed 2012-06-24 20:18 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
lock.patch kristjan.jonsson, 2012-06-21 14:11 review
Messages (10)
msg163335 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-06-21 14:11
_thread.LockType contains code for sanity checking when the lock is released, and for unlocking it when it is deleted.  Both operations involve actually try-lock operations that are costly.  Use a flag on the log to keep track of its locking state instead.

Also, acquiring a lock now first tries a try-aqcuire without releasing the  GIL, same as _thread.RLock().  This is done by moving logic from RLock.acquire to acquire_timed() so that both locks benefit.


Improvement, on a 64 bit windows machine:
Before:
D:\pydev\hg\cpython3\PCbuild\amd64>.\python.exe -m timeit -s "from _thread import allocate_lock" "allocate_lock()"
1000000 loops, best of 3: 0.725 usec per loop
D:\pydev\hg\cpython3\PCbuild\amd64>.\python.exe -m timeit -s "from _thread import allocate_lock; l=allocate_lock()" "l.acquire();l.release()"
1000000 loops, best of 3: 0.315 usec per loop

After:
D:\pydev\hg\cpython3\PCbuild\amd64>.\python.exe -m timeit -s "from _thread import allocate_lock" "allocate_lock()"
1000000 loops, best of 3: 0.691 usec per loop
D:\pydev\hg\cpython3\PCbuild\amd64>.\python.exe -m timeit -s "from _thread import allocate_lock; l=allocate_lock()" "l.acquire();l.release()"
1000000 loops, best of 3: 0.294 usec per loop

This amounts to 5% and 7% improvement, respectively.
msg163348 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-06-21 16:48
On 32 bit linux in a VM I get 

BEFORE
  allocation       0.125
  acquire/release  0.434

AFTER
  allocation       0.109  (-13%)
  acquire/release  0.346  (-20%)
msg163388 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-06-22 09:31
Interesting, I need to see what is the reason for that.  Possibly, sem_* primitives used there are so cheap that they don't matter.
msg163404 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-06-22 13:30
I cannot repro.  On my ubuntu virtualbox I get:

before:
kristjan@kristjan-VirtualBox:~/cpython$ ./python -m timeit -s "from _thread import allocate_lock" "allocate_lock()"
10000000 loops, best of 3: 0.0768 usec per loop
kristjan@kristjan-VirtualBox:~/cpython$ ./python -m timeit -s "from _thread import allocate_lock;l=allocate_lock()" "l.acquire();l.release()"
1000000 loops, best of 3: 0.209 usec per loop

after:
kristjan@kristjan-VirtualBox:~/cpython$ ./python -m timeit -s "from _thread import allocate_lock" "allocate_lock()"
10000000 loops, best of 3: 0.069 usec per loop
kristjan@kristjan-VirtualBox:~/cpython$ ./python -m timeit -s "from _thread import allocate_lock;l=allocate_lock()" "l.acquire();l.release()"
10000000 loops, best of 3: 0.182 usec per loop

I did this a few times with consistent results.  But virtual machines aren't the most stable of beasts.  Perhaps someone with a proper unix box can test it?
msg163406 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-22 13:35
64-bit Linux:

./python -m timeit -s "from _thread import allocate_lock" "allocate_lock()"
-> before: 0.0712 usec per loop
-> after: 0.0556 usec per loop

./python -m timeit -s "from _thread import allocate_lock;l=allocate_lock()" "l.acquire();l.release()"
-> before: 0.177 usec per loop
-> after: 0.168 usec per loop

So there's a small improvement on artificial benchmarks.
msg163408 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-06-22 13:56
well, I'm thinking for example about threading.Condition, where Lock objects are being created and erased willy nilly.  
Also, locks are normally uncontested.  The unconstested case is the one to optimize for. As such, the test cases are not artificial at all.

The implementers of RLock in _threadingmodule obviously thought this was important, I'm merely bringing that same benefit to the Lock object.
msg163411 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-22 14:16
> Also, locks are normally uncontested.  The unconstested case is the
> one to optimize for. As such, the test cases are not artificial at
> all.

Ok. I don't have any problem with the patch in principle (I didn't read
it in detail). I was just a bit surprised that you cared so much about
performance of lock allocation.

If you want the patch to go in 3.3, you will have to commit it before
the beta release, which is very soon (this week-end I think).
msg163425 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-06-22 15:58
Lock release (the silly saninty check) and acquisition of an uncontested lock.

I was waiting for that other windows lock stuff to go in before doing this, because I thought they might actually overlap in implementation, which they didn't.

I'll check this in later.  Meanwhile look at the other, slightly more contentious patch, issue #15139.

(condition variables are the whole reason I've been messing around with locks of late, since I had to make sure my threadpool-related HTTP requests on the PS3 performed adequately)
msg163455 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-06-22 18:44
New changeset dfc7fd24983a by Kristjan Valur Jonsson in branch 'default':
Issue #15124: Optimize _thread.LockType deletion and acquisition when
http://hg.python.org/cpython/rev/dfc7fd24983a
msg163844 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-24 20:18
Closing as fixed.
History
Date User Action Args
2012-06-24 20:18:42pitrousetstatus: open -> closed
resolution: fixed
messages: + msg163844
2012-06-22 18:45:45kristjan.jonssonsetstage: resolved
2012-06-22 18:44:56python-devsetnosy: + python-dev
messages: + msg163455
2012-06-22 15:58:11kristjan.jonssonsetmessages: + msg163425
2012-06-22 14:16:45pitrousetmessages: + msg163411
2012-06-22 13:56:56kristjan.jonssonsetmessages: + msg163408
2012-06-22 13:35:41pitrousetnosy: + pitrou
messages: + msg163406
2012-06-22 13:30:11kristjan.jonssonsetmessages: + msg163404
2012-06-22 09:31:14kristjan.jonssonsetmessages: + msg163388
2012-06-21 16:48:41sbtsetnosy: + sbt
messages: + msg163348
2012-06-21 14:12:01kristjan.jonssonsettype: performance
components: + Interpreter Core
versions: + Python 3.3
2012-06-21 14:11:39kristjan.jonssoncreate