classification
Title: Speed up threading.Condition wakeup
Type: performance Stage: patch review
Components: Interpreter Core, Library (Lib) Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Catalin.Patulea, anacrolix, georg.brandl, gregory.p.smith, jcea, kristjan.jonsson, loewis, pitrou, sbt, tim.peters
Priority: normal Keywords: patch

Created on 2012-06-22 15:38 by kristjan.jonsson, last changed 2014-04-04 14:18 by pitrou.

Files
File name Uploaded Description Edit
condition.patch kristjan.jonsson, 2012-06-22 15:38 review
queuetest.py kristjan.jonsson, 2012-06-22 15:38 Test file to measure wakeup delay
condition.patch kristjan.jonsson, 2012-06-23 21:19 review
condition.patch kristjan.jonsson, 2012-07-03 11:03 review
Messages (17)
msg163423 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-06-22 15:38
This patch is based on work previously done locally in python 2.7, see http://blog.ccpgames.com/kristjan/2012/05/25/optimizing-python-condition-variables-with-telemetry/

The idea is to acquire the signaling Lock() of the threading.Condition object _and_ the outer lock, without releasing and re-aquiring the GIL in between that is, as an atomic operation from the program's point of view.

_thread.LockType objects, used by the Condition object, now have an _acquire_condition method which knows how to do it for LockType and RLock objects.

Both lock types grow _release_save and _acquire_restore methods, the latter of which is a no-op.

_acquire_coindition() will ignore any other kinds of locks, including subclasses (if such things exist).

The result of this is a very marked improvement in signaling time on a contested machine.  A file, queuetest.py is provided which measures the wakeup delay.

Typical before the change:
best:    totaltime: 0.167006, avg delay: 0.004766, delay_stddev: 0.002058
typical: totaltime: 0.154271, avg delay: 0.027621, delay_stddev: 0.007479

after the change
typical: totaltime: 0.169217, avg delay: 0.001410, delay_stddev: 0.000722
msg163426 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-22 16:03
I think this should wait for 3.4. It's a bit too late now.
msg163427 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-06-22 16:13
aw, that's too bad.  Since this is not a feature implementat, we have the entire beta cycle to shake out any possible bugs.
msg163428 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-22 16:21
> aw, that's too bad.  Since this is not a feature implementat, we have
> the entire beta cycle to shake out any possible bugs.

We treat performance improvements as new features, though.
msg163436 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-06-22 16:58
hm, still one more day.  It will be a pity to wait two more years, perhaps Georg is feeling lucky :)
msg163592 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-06-23 12:00
Antoine is much more of an expert here, and I defer to his judgment that it is better to wait.
msg163597 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-23 12:29
I believe the patch is incorrect. It changes self._acquire_restore into a no-op, claiming that lock_acquire_condition will correctly restore the lock's state.

However, lock_acquire_condition may fail (e.g. if the timeout is not strictly positive), in which case the lock's case isn't properly restored.
msg163659 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-06-23 19:19
_acquire_condition() should _always_ re-aquire the outer lock, even for a timeout or an interrupt.

I see however that it doesn't do so if there is an argument error, but that should not be a problem for an internal method, since it would mean that threading.Condition() were incorrectly implemented.
msg163660 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-06-23 19:21
I'll fix the patch so that the lock is _always_ re-aqcuired.
msg163664 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-23 20:21
If the claim is that the _acquire_restore call becomes unnecessary, the entire try-finally should go. If that causes _acquire_restore to become unused, it should also go.
msg163676 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-06-23 21:19
Made lock_acquire_condition() more robust, always acquiring the outer lock if possible.
For Lock objects (not RLocks), set the "locked" variable that was recently added.
msg163681 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-06-23 21:37
Well, the rationale for keeping the try_finally is that the Condition should continue to work with other Locks, as long as they provide _release_save and _acquire_restore methods.  Lock._acquire_condtion onnly knows Lock and RLock intimately enough to automatically reaquire them with teh GIL released.

If I remove the _acuire_restore, we have prohibited the Condition class from using customized locks, or subclasses of locks.
msg164347 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-06-29 18:41
Currently negative timeouts are treated as zero timeouts by Condition.wait().  The patches turns them into errors.
msg164590 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-07-03 11:03
Thank you Richard.
A new patch is included.  Now the processing of "timeout" is done in _acquire_condition().  None is infinite, and negative timeouts are clipped to zero.

Do you feel that it is unnecessary to be able to support other locks than Lock() and RLock() as the outer lock?  If so, then we can drop the "_acquire_restore()" as suggested by Martin.
msg164647 - (view) Author: Matt Joiner (anacrolix) Date: 2012-07-04 14:39
Did this make it into 3.3?
msg164650 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-07-04 15:52
Nope.
msg215520 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-04-04 13:35
In our 2.7 branches, this approach has been superseded with a natively impolemented _Condition class.  This is even more efficient.  It is available if the underlying Lock implementation is based on pthread locks (not semaphores).
History
Date User Action Args
2014-04-04 14:18:20pitrousetnosy: + tim.peters
2014-04-04 13:35:20kristjan.jonssonsetmessages: + msg215520
2013-04-28 05:53:09Catalin.Patuleasetnosy: + Catalin.Patulea
2012-07-04 15:52:38gregory.p.smithsetmessages: + msg164650
2012-07-04 14:39:41anacrolixsetnosy: + anacrolix
messages: + msg164647
2012-07-03 11:03:15kristjan.jonssonsetfiles: + condition.patch

messages: + msg164590
2012-06-29 18:41:59sbtsetmessages: + msg164347
2012-06-23 21:47:26gregory.p.smithsetnosy: + gregory.p.smith
2012-06-23 21:37:26kristjan.jonssonsetmessages: + msg163681
2012-06-23 21:19:16kristjan.jonssonsetfiles: + condition.patch

messages: + msg163676
2012-06-23 20:21:28loewissetmessages: + msg163664
2012-06-23 19:21:43kristjan.jonssonsetmessages: + msg163660
2012-06-23 19:19:45kristjan.jonssonsetmessages: + msg163659
2012-06-23 12:29:14loewissetnosy: + loewis
messages: + msg163597
2012-06-23 12:00:10georg.brandlsetmessages: + msg163592
2012-06-22 16:58:21kristjan.jonssonsetnosy: + georg.brandl
messages: + msg163436
2012-06-22 16:21:04pitrousetmessages: + msg163428
2012-06-22 16:13:41kristjan.jonssonsetmessages: + msg163427
2012-06-22 16:03:18pitrousetversions: + Python 3.4, - Python 3.3
nosy: + pitrou, sbt

messages: + msg163426

stage: patch review
2012-06-22 15:48:23jceasetnosy: + jcea
2012-06-22 15:38:57kristjan.jonssonsetfiles: + queuetest.py
2012-06-22 15:38:31kristjan.jonssoncreate