Issue8799
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010-05-24 09:44 by kristjan.jonsson, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
locktest.patch | kristjan.jonsson, 2010-05-24 09:44 | review | ||
lock_tests.diff | kristjan.jonsson, 2012-04-09 13:29 | review | ||
tpconditiondoc.patch | pitrou, 2012-04-10 18:39 | |||
lock_tests.patch | kristjan.jonsson, 2013-11-08 14:13 | updated ro | review |
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 2320 | merged | vstinner, 2017-06-22 00:00 |
Messages (28) | |||
---|---|---|---|
msg106349 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2010-05-24 09:44 | |
The tests for the ConditionVariable a fragile. On a slow computer, or when running a DEBUG build, the main thread can issue a condition.notify() call, even though the worker threads have not settled in on a wait() call. This causes the test to stall. The provided patch "fixes" this by adding a short _wait() before the notify. |
|||
msg106355 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-05-24 11:43 | |
Strange, I have never noticed this (even on the buildbots). |
|||
msg106360 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2010-05-24 12:11 | |
Well, I saw this occasionally on my dual core laptop when working on the RWLock. A notify() before a target does a wait() is ineffective and it would wait indefinitely after the notify(5) call. This isn't a "proper" fix (I had another, more complex, where we would keep a counter for the number of successful "waiters" and wait for that to reach the target value) but it is in the spirit of many of these tests, where we rely on _wait() to be sufficient to reach a certain state. |
|||
msg157844 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2012-04-09 13:29 | |
Here is a new patch. 1) I´ve simplified and relaxed test_notify() for Condition objects. Condition variables don't guarantee that there won't be any spurious wakeups so the test must maintain internal bookkeeping so that it doesn't break with a different implementation of Condition. 2) The main thread now properly waits for the workers to "settle" in the test. 2) I've added two generic tests of Condition objects in their "natural environment" as building blocks for bigger things, a Queue and a Lock. |
|||
msg157847 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-04-09 15:01 | |
> Condition variables don't guarantee that there won't be any spurious > wakeups What do you mean? The implementation doesn't seem prone to spurious wakeups, and the docs don't say so either. > I've added two generic tests of Condition objects in their "natural > environment" as building blocks for bigger things, a Queue and a Lock. Why would you build a Lock out of a Condition? |
|||
msg157942 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2012-04-10 10:05 | |
We shouldn't be testing implementation details. The Condition variables are canonically prone to "spurious wakeups" and "stolen wakeups". The fact that the current implementation doesn't have them shouldn't place that burden on future implementations of threading.Condition on this platform. From the docs: "Note: Condition variables can be, depending on the implementation, subject to both spurious wakeups (when wait() returns without a notify() call) and stolen wakeups (when another thread acquires the lock before the awoken thread.) For this reason, it is always necessary to verify the state the thread is waiting for when wait() returns and optionally repeat the call as often as necessary." 2) You would build a lock yourself if you wanted exact control over its behaviour. In this case, we are just doing what threading.Lock() does, but it could be more complicated. At any rate, it proves a useful toy usage of a condition variable in the context of making sure that it works properly as part of the unit tests. I could leave the original test in place and just add the new ones, but the old test is at least provably a subject to a race condition as per the original issue. |
|||
msg157944 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-04-10 10:38 | |
> The Condition variables are canonically prone to "spurious wakeups" > and "stolen wakeups". No, they aren't. Just because POSIX says they are doesn't mean *our* condition variables are the same. Spurious wakeups are an annoyance, and our implementation AFAICT never exhibited them. > From the docs: "Note: Condition variables can be, depending on the > implementation, subject to both spurious wakeups (when wait() returns > without a notify() call) and stolen wakeups (when another thread > acquires the lock before the awoken thread.) For this reason, it is > always necessary to verify the state the thread is waiting for when > wait() returns and optionally repeat the call as often as necessary." Ah, thanks, indeed. Except that... this was added by yourself in 483bbebc57bf, after issue 10260, but *without* being part of the original patch that you uploaded on that issue. So this never got reviewed and was instead sneaked in the docs in a commit of yours. Unless other people disagree, I think this addition should be reverted. |
|||
msg157968 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2012-04-10 17:19 | |
Stolen wakeups are a fact of life though, even in cpython's implementation of condition variables. And for the user of a condition variable the difference is so subtle as to not warrant special mention. This is, btw, not just a posix thing. It is same on windows, java, and in fact, comes from the original definition of a condition variable as part of the "monitor" pattern. Condition variables are defined have this caveat to allow for flexibility in their implementation. Look anywhere in the internets and you will find this, including here: http://www.cs.berkeley.edu/~kubitron/courses/cs162/hand-outs/synch.html The canonical and correct way to use a condition variable is to repeatedly wait() and verify that the wakeup condition holds. The wait_for() method abstracts this into a convenient helper. There is no need to befuddle the issue by special casing the python version as being guaranteed to not have suprious wakeups, but otherwise do have the stolen thread problem. Quite the contrary, we should be careful to never overspecify our interfaces so that we may not change their ipmlementation details later. A good summary of the situation and the reason for the specification can be found here: http://stackoverflow.com/questions/8594591/why-does-pthread-cond-wait-have-spurious-wakeups The comment I added to the documentation clarifies the reason why other examples in the docs were using while loops and teaches shows people how to use these constructs in a robust manner. Removing it is a bad idea and I think you will find others will agree. Take it up on python-dev if you want. Unless you disagree, I'll add these test cases in addition to the broken one, which can then be fixed or removed later if it turns out to be problematic to other people. |
|||
msg157972 - (view) | Author: Charles-François Natali (neologix) * | Date: 2012-04-10 18:01 | |
>> The Condition variables are canonically prone to "spurious wakeups" >> and "stolen wakeups". > > No, they aren't. Just because POSIX says they are doesn't mean *our* > condition variables are the same. Spurious wakeups are an annoyance, and > our implementation AFAICT never exhibited them. I agree with Antoine. Spurious wakeups are an implementation issue (mainly has to do with efficiency on SMP systems and interrupted syscalls), but are definitely not a part of the "standard" API. Since our implementation doesn't exhibit this problem, there's no reason to scare and confuse people. For example, our locks are special in the sense that they can be released by a thread other than the one that acquired it, etc, and this behavior is documented. However, while I think the spurious wakeups reference could be removed, I think that the advice of repeatedly chcking the invariant shoud be kept, because it is always desirable (stolen wakeup, and better logic). |
|||
msg157973 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-04-10 18:03 | |
> Stolen wakeups are a fact of life though, even in cpython's > implementation of condition variables. And for the user of a > condition variable the difference is so subtle as to not warrant > special mention. I don't think it's a subtle difference. Being woken up too late is not the same as being woken up spuriously. Also, the term "stolen wakeup" makes it unnecessarily dramatic. There is no "stealing" involved here. > Look anywhere in the internets and you will find this, including here: > http://www.cs.berkeley.edu/~kubitron/courses/cs162/hand-outs/synch.html Thanks, will take a look. > Quite the contrary, we should be careful to never overspecify our > interfaces so that we may not change their ipmlementation details > later. The interface is already, de facto, specified by its one and only implementation, which has been in use for a long time. It's also specified by its documentation, before it was silently modified by you. There's probably code out there that relies on wait() not returning prematurately. > A good summary of the situation and the reason for the specification > can be found here: > http://stackoverflow.com/questions/8594591/why-does-pthread-cond-wait-have-spurious-wakeups So the reason appears to be: “Spurious wakeups may sound strange, but on some multiprocessor systems, making condition wakeup completely predictable might substantially slow all condition variable operations.” Python is a high-level language, so making condition variable ops "substantially" slower is not a blocking issue for us. What follows is even less enlightening: “The intent was to force correct/robust code by requiring predicate loops. This was driven by the provably correct academic contingent among the "core threadies" in the working group, though I don't think anyone really disagreed with the intent once they understood what it meant.” > The comment I added to the documentation clarifies the reason why > other examples in the docs were using while loops and teaches shows > people how to use these constructs in a robust manner. Teaching people to use loops is fine. But there's no need to mislead them with inaccurate descriptions of the implementation's behaviour. > Unless you disagree, I'll add these test cases in addition to the > broken one, which can then be fixed or removed later if it turns out > to be problematic to other people. Really, can you stop being aggressive? You haven't proved that test is wrong, and it passes consistently fine on all buildbots. If you don't agree with what is being tested, that's another matter. But please accept that the test is ok w.r.t. Condition's specified behaviour. |
|||
msg157976 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-04-10 18:39 | |
I'm proposing the following changes to the threading docs. Most of them are cleanups and small improvements, but I also rewrote the offending paragraph, and made the wait_for example more proeminent. |
|||
msg157980 - (view) | Author: Charles-François Natali (neologix) * | Date: 2012-04-10 20:21 | |
> I'm proposing the following changes to the threading docs. Most of them are cleanups and small improvements, but I also rewrote the offending paragraph, and made the wait_for example more proeminent. Looks good to me. |
|||
msg157984 - (view) | Author: Roundup Robot (python-dev) | Date: 2012-04-10 20:57 | |
New changeset 2f51dca92883 by Antoine Pitrou in branch '3.2': Issue #8799: Fix and improve the threading.Condition documentation. http://hg.python.org/cpython/rev/2f51dca92883 |
|||
msg157987 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-04-10 20:59 | |
Ok, I've fixed the docs in 3.2 and 3.3. |
|||
msg158013 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2012-04-11 09:19 | |
A few comments: 1) with cv: make_an_item_available() + cv.notify() 2) one of the benefits of wait_for() is that it automates the tricky timekeeping needed if one wants an somewhat accurate timeout, you may want to mention this specifically. 3) the offending part. I see that you don't want to use the technical terms of spurious wakeups or stolen wakeups. That's ok, but the resulting explanation is somwhat bogus. I suggest the following: The ``while`` loop checking for the application's condition is necessary because :meth:`~Condition.wait` can return after an arbitrary long time, and the condition which prompted the :meth:`~Condition.notify` call may not yet, or no longer, hold true. This is an inherent property of multi-threaded programming and the use of this pattern is essential for the robust use of Condition objects. |
|||
msg158017 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-04-11 10:12 | |
> A few comments: > 1) > with cv: > make_an_item_available() > + cv.notify() Did I forget this? Ow. > 2) one of the benefits of wait_for() is that it automates the tricky > timekeeping needed if one wants an somewhat accurate timeout, you may > want to mention this specifically. Ok. > 3) the offending part. I see that you don't want to use the technical > terms of spurious wakeups or stolen wakeups. That's ok, but the > resulting explanation is somwhat bogus. I suggest the following: > > The ``while`` loop checking for the application's condition is > necessary > because :meth:`~Condition.wait` can return after an arbitrary long > time, and the condition which prompted the :meth:`~Condition.notify` > call may not yet, or no longer, hold true. This is an inherent > property of multi-threaded programming and the use of this pattern is > essential for the robust use of Condition objects. But, once again, "the condition may not yet hold true" is false. However, the rest of the suggestion looks good, thanks. |
|||
msg158019 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2012-04-11 10:50 | |
> But, once again, "the condition may not yet hold true" is false. In our current implementation, yes. But it is intentionally left undefined in the specification of the condition variable protocol, for very good reasons. While I'm fine with not mentioning it in the docs, I would be very much against us actually specifying the opposite (that early wakeups never occur) because this will unnecessarily limit our options. Since the while() loop pattern is already recommended because of the "stolen wakeup" problem, leaving the "early/spurious" wakeup behaviour" undefined is wise, since there is nothing to be gained by actually guaranteeing that there will be no early wakeups. This is just good software engineering practice. This is also why we, IMHO, shouldn't rely on this behaviour in the unittests. One should never write tests that depend on unspecified behaviour, again, good engineering practice. |
|||
msg158020 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-04-11 11:00 | |
Le mercredi 11 avril 2012 à 10:50 +0000, Kristján Valur Jónsson a écrit : > > But, once again, "the condition may not yet hold true" is false. > In our current implementation, yes. But it is intentionally left > undefined in the specification of the condition variable protocol, for > very good reasons. No, it is not "left undefined". If the documentation doesn't say spurious wakeups may occur, then they are not supposed to occur. Predictable behaviour is a good thing for users. > While I'm fine with not mentioning it in the docs, I would be very > much against us actually specifying the opposite (that early wakeups > never occur) because this will unnecessarily limit our options. Which options? > This is also why we, IMHO, shouldn't rely on this behaviour in the > unittests. Disagreed. Unit tests should definitely protect against the introduction of bugs (willingly or not). And unpredictable behaviour is usually considered a bug. If you think the condition variable specification should be changed, you can always ask for approval on python-dev. But I don't even see the point: you are not demonstrating any *practical* advantage in doing so. |
|||
msg158022 - (view) | Author: Charles-François Natali (neologix) * | Date: 2012-04-11 11:32 | |
[...] > Disagreed. Unit tests should definitely protect against the introduction > of bugs (willingly or not). And unpredictable behaviour is usually > considered a bug. > > If you think the condition variable specification should be changed, you > can always ask for approval on python-dev. But I don't even see the > point: you are not demonstrating any *practical* advantage in doing so. One can imagine, for example, that another implementation (or maybe CPython in a later version) exposes native condition variables on POSIX, instead of emulating them (e.g. what happended for TLS implementation). In that case, we could get spurious wakeups, and code not designed to cope with them would break. |
|||
msg158023 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-04-11 11:40 | |
Le mercredi 11 avril 2012 à 11:32 +0000, Charles-François Natali a écrit : > One can imagine, for example, that another implementation (or maybe > CPython in a later version) exposes native condition variables on > POSIX, instead of emulating them (e.g. what happended for TLS > implementation). That's right, but spurious wakeups would be a good argument against accepting such an implementation. Or that implementation should be fixed to avoid them. That said, we couldn't use the POSIX implementation anyway, since our API allows the user to choose the type of lock, while POSIX AFAICT would only accept a POSIX mutex (which doesn't have the same semantics as an arbitrary Python lock). |
|||
msg158062 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2012-04-11 17:44 | |
Ok, doc improved in 9d4109af8f3b. |
|||
msg158076 - (view) | Author: Charles-François Natali (neologix) * | Date: 2012-04-11 21:23 | |
> Ok, doc improved in 9d4109af8f3b. LGTM. Kristján, how about updating your patch to only fix the original problem you spotted (notify() called before wait()), then we can see the remaining parts? |
|||
msg202319 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2013-11-07 09:18 | |
Ah, here we are some 18 months later. Let me have another go. |
|||
msg202420 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * | Date: 2013-11-08 14:13 | |
Here is an updated patch that only adds the necessary safety delay to the main thread. It also explains these in the comments. |
|||
msg202614 - (view) | Author: Roundup Robot (python-dev) | Date: 2013-11-11 11:32 | |
New changeset 9e48c9538a7f by Kristjan Valur Jonsson in branch 'default': Issue #8799: Reduce timing sensitivity of condition test by explicitly http://hg.python.org/cpython/rev/9e48c9538a7f |
|||
msg296610 - (view) | Author: STINNER Victor (vstinner) * | Date: 2017-06-22 00:09 | |
New changeset da6d305b6fcd49ba1224b1fd2131d7648a5be6b9 by Victor Stinner in branch '2.7': bpo-8799: Reduce timing sensitivity of condition test by explicitly (#2320) https://github.com/python/cpython/commit/da6d305b6fcd49ba1224b1fd2131d7648a5be6b9 |
|||
msg296611 - (view) | Author: STINNER Victor (vstinner) * | Date: 2017-06-22 00:11 | |
The race condition is not dead, at least in Python 2.7 :-) I backported the change for bpo-30727: "[2.7] test_threading.ConditionTests.test_notify() hangs randomly on Python 2.7". |
|||
msg321776 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-07-16 22:25 | |
This issue discussed different things: * test_threading was unable: I didn't see such failure the last 12 months, and I backported enhancements from master to 2.7, so I consider this issue as fixed * documentation of threading locks: fixed * Except on Windows, Python 2.7 uses its own implementation of TLS: this issue has been fixed in Python 3. All discussed issues have been fixed, so I close the issue. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:01 | admin | set | github: 53045 |
2018-07-16 22:25:49 | vstinner | set | status: open -> closed resolution: fixed messages: + msg321776 stage: resolved |
2017-06-22 00:11:02 | vstinner | set | messages: + msg296611 |
2017-06-22 00:09:42 | vstinner | set | nosy:
+ vstinner messages: + msg296610 |
2017-06-22 00:00:00 | vstinner | set | pull_requests: + pull_request2371 |
2013-11-11 11:32:22 | python-dev | set | messages: + msg202614 |
2013-11-08 14:13:39 | kristjan.jonsson | set | files:
+ lock_tests.patch messages: + msg202420 |
2013-11-07 09:18:57 | kristjan.jonsson | set | messages: + msg202319 |
2013-11-06 19:35:42 | akuchling | set | keywords: - easy |
2012-04-11 21:23:29 | neologix | set | messages: + msg158076 |
2012-04-11 17:44:28 | pitrou | set | messages: + msg158062 |
2012-04-11 11:40:33 | pitrou | set | messages: + msg158023 |
2012-04-11 11:32:43 | neologix | set | messages: + msg158022 |
2012-04-11 11:00:01 | pitrou | set | messages: + msg158020 |
2012-04-11 10:50:35 | kristjan.jonsson | set | messages: + msg158019 |
2012-04-11 10:12:19 | pitrou | set | messages: + msg158017 |
2012-04-11 09:19:08 | kristjan.jonsson | set | messages: + msg158013 |
2012-04-10 20:59:47 | pitrou | set | messages: + msg157987 |
2012-04-10 20:57:09 | python-dev | set | nosy:
+ python-dev messages: + msg157984 |
2012-04-10 20:21:11 | neologix | set | messages: + msg157980 |
2012-04-10 18:39:35 | pitrou | set | files:
+ tpconditiondoc.patch messages: + msg157976 |
2012-04-10 18:03:33 | pitrou | set | messages: + msg157973 |
2012-04-10 18:01:50 | neologix | set | messages: + msg157972 |
2012-04-10 17:19:39 | kristjan.jonsson | set | messages: + msg157968 |
2012-04-10 10:38:59 | pitrou | set | messages: + msg157944 |
2012-04-10 10:05:20 | kristjan.jonsson | set | messages: + msg157942 |
2012-04-09 15:01:45 | pitrou | set | nosy:
+ neologix messages: + msg157847 |
2012-04-09 13:29:01 | kristjan.jonsson | set | files:
+ lock_tests.diff messages: + msg157844 versions: + Python 3.3, - Python 2.7, Python 3.2 |
2010-06-29 08:44:31 | csernazs | set | nosy:
+ csernazs |
2010-05-24 12:11:45 | kristjan.jonsson | set | keywords:
patch, patch, easy messages: + msg106360 |
2010-05-24 11:43:21 | pitrou | set | versions:
+ Python 3.2 nosy: + jyasskin, pitrou messages: + msg106355 keywords: patch, patch, easy |
2010-05-24 09:44:15 | kristjan.jonsson | create |