classification
Title: Hang in lib/test/test_threading.py
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: csernazs, jyasskin, kristjan.jonsson, neologix, pitrou, python-dev, vstinner
Priority: normal Keywords: patch

Created on 2010-05-24 09:44 by kristjan.jonsson, last changed 2018-07-16 22:25 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
locktest.patch kristjan.jonsson, 2010-05-24 09:44 review
lock_tests.diff kristjan.jonsson, 2012-04-09 13:29 review
tpconditiondoc.patch pitrou, 2012-04-10 18:39
lock_tests.patch kristjan.jonsson, 2013-11-08 14:13 updated ro review
Pull Requests
URL Status Linked Edit
PR 2320 merged vstinner, 2017-06-22 00:00
Messages (28)
msg106349 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-05-24 09:44
The tests for the ConditionVariable a fragile.  On a slow computer, or when running a DEBUG build, the main thread can issue a condition.notify() call, even though the worker threads have not settled in on a wait() call.  This causes the test to stall.
The provided patch "fixes" this by adding a short _wait() before the notify.
msg106355 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-24 11:43
Strange, I have never noticed this (even on the buildbots).
msg106360 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-05-24 12:11
Well, I saw this occasionally on my dual core laptop when working on the RWLock.  A notify() before a target does a wait() is ineffective and it would wait indefinitely after the notify(5) call.  This isn't a "proper" fix (I had another, more complex, where we would keep a counter for the number of successful "waiters" and wait for that to reach the target value) but it is in the spirit of many of these tests, where we rely on _wait() to be sufficient to reach a certain state.
msg157844 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-09 13:29
Here is a new patch.
1) I´ve simplified and relaxed test_notify() for Condition objects.  Condition variables don't guarantee that there won't be any spurious wakeups so the test must maintain internal bookkeeping so that it doesn't break with a different implementation of Condition.
2) The main thread now properly waits for the workers to "settle" in the test.
2) I've added two generic tests of Condition objects in their "natural environment" as building blocks for bigger things, a Queue and a Lock.
msg157847 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-09 15:01
> Condition variables don't guarantee that there won't be any spurious
> wakeups

What do you mean? The implementation doesn't seem prone to spurious wakeups, and the docs don't say so either.

> I've added two generic tests of Condition objects in their "natural
> environment" as building blocks for bigger things, a Queue and a Lock.

Why would you build a Lock out of a Condition?
msg157942 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-10 10:05
We shouldn't be testing implementation details.  The Condition variables are canonically prone to "spurious wakeups" and "stolen wakeups".  The fact that the current implementation doesn't have them shouldn't place that burden on future implementations of threading.Condition on this platform.
From the docs: "Note: Condition variables can be, depending on the implementation, subject to both spurious wakeups (when wait() returns without a notify() call) and stolen wakeups (when another thread acquires the lock before the awoken thread.) For this reason, it is always necessary to verify the state the thread is waiting for when wait() returns and optionally repeat the call as often as necessary."

2) You would build a lock yourself if you wanted exact control over its behaviour.  In this case, we are just doing what threading.Lock() does, but it could be more complicated.  At any rate, it proves a useful toy usage of a condition variable in the context of making sure that it works properly as part of the unit tests.

I could leave the original test in place and just add the new ones, but the old test is at least provably a subject to a race condition as per the original issue.
msg157944 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-10 10:38
> The Condition variables are canonically prone to "spurious wakeups"
> and "stolen wakeups".

No, they aren't. Just because POSIX says they are doesn't mean *our*
condition variables are the same. Spurious wakeups are an annoyance, and
our implementation AFAICT never exhibited them.

> From the docs: "Note: Condition variables can be, depending on the
> implementation, subject to both spurious wakeups (when wait() returns
> without a notify() call) and stolen wakeups (when another thread
> acquires the lock before the awoken thread.) For this reason, it is
> always necessary to verify the state the thread is waiting for when
> wait() returns and optionally repeat the call as often as necessary."

Ah, thanks, indeed. Except that...
this was added by yourself in 483bbebc57bf, after issue 10260, but
*without* being part of the original patch that you uploaded on that
issue.
So this never got reviewed and was instead sneaked in the docs in a
commit of yours.
Unless other people disagree, I think this addition should be reverted.
msg157968 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-10 17:19
Stolen wakeups are a fact of life though, even in cpython's implementation of condition variables.  And for the user of a condition variable the difference is so subtle as to not warrant special mention.

This is, btw, not just a posix thing.  It is same on windows, java, and in fact, comes from the original definition of a condition variable as part of the "monitor" pattern.  Condition variables are defined have this caveat to allow for flexibility in their implementation.
Look anywhere in the internets and you will find this, including here:
http://www.cs.berkeley.edu/~kubitron/courses/cs162/hand-outs/synch.html

The canonical and correct way to use a condition variable is to repeatedly wait() and verify that the wakeup condition holds.  The wait_for() method abstracts this into a convenient helper.  There is no need to befuddle the issue by special casing the python version as being guaranteed to not have suprious wakeups, but otherwise do have the stolen thread problem.  Quite the contrary, we should be careful to never overspecify our interfaces so that we may not change their ipmlementation details later.

A good summary of the situation and the reason for the specification can be found here:
http://stackoverflow.com/questions/8594591/why-does-pthread-cond-wait-have-spurious-wakeups

The comment I added to the documentation clarifies the reason why other examples in the docs were using while loops and teaches shows people how to use these constructs in a robust manner.  Removing it is a bad idea and I think you will find others will agree.  Take it up on python-dev if you want.

Unless you disagree, I'll add these test cases in addition to the broken one, which can then be fixed or removed later if it turns out to be problematic to other people.
msg157972 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-04-10 18:01
>> The Condition variables are canonically prone to "spurious wakeups"
>> and "stolen wakeups".
>
> No, they aren't. Just because POSIX says they are doesn't mean *our*
> condition variables are the same. Spurious wakeups are an annoyance, and
> our implementation AFAICT never exhibited them.

I agree with Antoine.
Spurious wakeups are an implementation issue (mainly has to do with
efficiency on SMP systems and interrupted syscalls), but are
definitely not a part of the "standard" API. Since our implementation
doesn't exhibit this problem, there's no reason to scare and confuse
people.
For example, our locks are special in the sense that they can be
released by a thread other than the one that acquired it, etc, and
this behavior is documented.

However, while I think the spurious wakeups reference could be
removed, I think that the advice of repeatedly chcking the invariant
shoud be kept, because it is always desirable (stolen wakeup, and
better logic).
msg157973 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-10 18:03
> Stolen wakeups are a fact of life though, even in cpython's
> implementation of condition variables.  And for the user of a
> condition variable the difference is so subtle as to not warrant
> special mention.

I don't think it's a subtle difference. Being woken up too late is not
the same as being woken up spuriously.
Also, the term "stolen wakeup" makes it unnecessarily dramatic. There is
no "stealing" involved here.

> Look anywhere in the internets and you will find this, including here:
> http://www.cs.berkeley.edu/~kubitron/courses/cs162/hand-outs/synch.html

Thanks, will take a look.

> Quite the contrary, we should be careful to never overspecify our
> interfaces so that we may not change their ipmlementation details
> later.

The interface is already, de facto, specified by its one and only
implementation, which has been in use for a long time. It's also
specified by its documentation, before it was silently modified by you.

There's probably code out there that relies on wait() not returning
prematurately.

> A good summary of the situation and the reason for the specification
> can be found here:
> http://stackoverflow.com/questions/8594591/why-does-pthread-cond-wait-have-spurious-wakeups

So the reason appears to be:

        “Spurious wakeups may sound strange, but on some multiprocessor
        systems, making condition wakeup completely predictable might
        substantially slow all condition variable operations.”

Python is a high-level language, so making condition variable ops
"substantially" slower is not a blocking issue for us.

What follows is even less enlightening:

        “The intent was to force correct/robust code by requiring
        predicate loops. This was  driven by the provably correct
        academic contingent among the "core threadies" in the working
        group, though I don't think anyone really disagreed with the
        intent once they understood what it meant.”

> The comment I added to the documentation clarifies the reason why
> other examples in the docs were using while loops and teaches shows
> people how to use these constructs in a robust manner.

Teaching people to use loops is fine. But there's no need to mislead
them with inaccurate descriptions of the implementation's behaviour.

> Unless you disagree, I'll add these test cases in addition to the
> broken one, which can then be fixed or removed later if it turns out
> to be problematic to other people.

Really, can you stop being aggressive? You haven't proved that test is
wrong, and it passes consistently fine on all buildbots.

If you don't agree with what is being tested, that's another matter. But
please accept that the test is ok w.r.t. Condition's specified
behaviour.
msg157976 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-10 18:39
I'm proposing the following changes to the threading docs. Most of them are cleanups and small improvements, but I also rewrote the offending paragraph, and made the wait_for example more proeminent.
msg157980 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-04-10 20:21
> I'm proposing the following changes to the threading docs. Most of them are cleanups and small improvements, but I also rewrote the offending paragraph, and made the wait_for example more proeminent.

Looks good to me.
msg157984 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-04-10 20:57
New changeset 2f51dca92883 by Antoine Pitrou in branch '3.2':
Issue #8799: Fix and improve the threading.Condition documentation.
http://hg.python.org/cpython/rev/2f51dca92883
msg157987 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-10 20:59
Ok, I've fixed the docs in 3.2 and 3.3.
msg158013 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-11 09:19
A few comments:
1)
   with cv:
       make_an_item_available()
+       cv.notify()

2) one of the benefits of wait_for() is that it automates the tricky timekeeping needed if one wants an somewhat accurate timeout, you may want to mention this specifically.

3) the offending part.  I see that you don't want to use the technical terms of spurious wakeups or stolen wakeups. That's ok, but the resulting explanation is somwhat bogus.  I suggest the following:

The ``while`` loop checking for the application's condition is necessary
because :meth:`~Condition.wait` can return after an arbitrary long time, and the condition which prompted the :meth:`~Condition.notify` call may not yet, or no longer, hold true.  This is an inherent property of multi-threaded programming and the use of this pattern is essential for the robust use of Condition objects.
msg158017 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-11 10:12
> A few comments:
> 1)
>    with cv:
>        make_an_item_available()
> +       cv.notify()

Did I forget this? Ow.

> 2) one of the benefits of wait_for() is that it automates the tricky
> timekeeping needed if one wants an somewhat accurate timeout, you may
> want to mention this specifically.

Ok.

> 3) the offending part.  I see that you don't want to use the technical
> terms of spurious wakeups or stolen wakeups. That's ok, but the
> resulting explanation is somwhat bogus.  I suggest the following:
> 
> The ``while`` loop checking for the application's condition is
> necessary
> because :meth:`~Condition.wait` can return after an arbitrary long
> time, and the condition which prompted the :meth:`~Condition.notify`
> call may not yet, or no longer, hold true.  This is an inherent
> property of multi-threaded programming and the use of this pattern is
> essential for the robust use of Condition objects.

But, once again, "the condition may not yet hold true" is false.
However, the rest of the suggestion looks good, thanks.
msg158019 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-11 10:50
> But, once again, "the condition may not yet hold true" is false.
In our current implementation, yes.  But it is intentionally left undefined in the specification of the condition variable protocol, for very good reasons.
While I'm fine with not mentioning it in the docs, I would be very much against us actually specifying the opposite (that early wakeups never occur) because this will unnecessarily limit our options.  Since the while() loop pattern is already recommended because of the "stolen wakeup" problem, leaving the "early/spurious" wakeup behaviour" undefined is wise, since there is nothing to be gained by actually guaranteeing that there will be no early wakeups.  This is just good software engineering practice.

This is also why we, IMHO, shouldn't rely on this behaviour in the unittests.  One should never write tests that depend on unspecified behaviour, again, good engineering practice.
msg158020 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-11 11:00
Le mercredi 11 avril 2012 à 10:50 +0000, Kristján Valur Jónsson a
écrit :
> > But, once again, "the condition may not yet hold true" is false.
> In our current implementation, yes.  But it is intentionally left
> undefined in the specification of the condition variable protocol, for
> very good reasons.

No, it is not "left undefined". If the documentation doesn't say
spurious wakeups may occur, then they are not supposed to occur.
Predictable behaviour is a good thing for users.

> While I'm fine with not mentioning it in the docs, I would be very
> much against us actually specifying the opposite (that early wakeups
> never occur) because this will unnecessarily limit our options.

Which options?

> This is also why we, IMHO, shouldn't rely on this behaviour in the
> unittests.

Disagreed. Unit tests should definitely protect against the introduction
of bugs (willingly or not). And unpredictable behaviour is usually
considered a bug.

If you think the condition variable specification should be changed, you
can always ask for approval on python-dev. But I don't even see the
point: you are not demonstrating any *practical* advantage in doing so.
msg158022 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-04-11 11:32
[...]
> Disagreed. Unit tests should definitely protect against the introduction
> of bugs (willingly or not). And unpredictable behaviour is usually
> considered a bug.
>
> If you think the condition variable specification should be changed, you
> can always ask for approval on python-dev. But I don't even see the
> point: you are not demonstrating any *practical* advantage in doing so.

One can imagine, for example, that another implementation (or maybe
CPython in a later version) exposes native condition variables on
POSIX, instead of emulating them (e.g. what happended for TLS
implementation).
In that case, we could get spurious wakeups, and code not designed to
cope with them would break.
msg158023 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-11 11:40
Le mercredi 11 avril 2012 à 11:32 +0000, Charles-François Natali a
écrit :
> One can imagine, for example, that another implementation (or maybe
> CPython in a later version) exposes native condition variables on
> POSIX, instead of emulating them (e.g. what happended for TLS
> implementation).

That's right, but spurious wakeups would be a good argument against
accepting such an implementation. Or that implementation should be fixed
to avoid them.

That said, we couldn't use the POSIX implementation anyway, since our
API allows the user to choose the type of lock, while POSIX AFAICT would
only accept a POSIX mutex (which doesn't have the same semantics as an
arbitrary Python lock).
msg158062 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-11 17:44
Ok, doc improved in 9d4109af8f3b.
msg158076 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-04-11 21:23
> Ok, doc improved in 9d4109af8f3b.

LGTM.

Kristján, how about updating your patch to only fix the original
problem you spotted (notify() called before wait()), then we can see
the remaining parts?
msg202319 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-11-07 09:18
Ah, here we are some 18 months later.  Let me have another go.
msg202420 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-11-08 14:13
Here is an updated patch that only adds the necessary safety delay to the main thread.  It also explains these in the comments.
msg202614 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-11-11 11:32
New changeset 9e48c9538a7f by Kristjan Valur Jonsson in branch 'default':
Issue #8799: Reduce timing sensitivity of condition test by explicitly
http://hg.python.org/cpython/rev/9e48c9538a7f
msg296610 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-22 00:09
New changeset da6d305b6fcd49ba1224b1fd2131d7648a5be6b9 by Victor Stinner in branch '2.7':
bpo-8799: Reduce timing sensitivity of condition test by explicitly (#2320)
https://github.com/python/cpython/commit/da6d305b6fcd49ba1224b1fd2131d7648a5be6b9
msg296611 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-22 00:11
The race condition is not dead, at least in Python 2.7 :-) I backported the change for bpo-30727: "[2.7] test_threading.ConditionTests.test_notify() hangs randomly on Python 2.7".
msg321776 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-16 22:25
This issue discussed different things:

* test_threading was unable: I didn't see such failure the last 12 months, and I backported enhancements from master to 2.7, so I consider this issue as fixed
* documentation of threading locks: fixed
* Except on Windows, Python 2.7 uses its own implementation of TLS: this issue has been fixed in Python 3.

All discussed issues have been fixed, so I close the issue.
History
Date User Action Args
2018-07-16 22:25:49vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg321776

stage: resolved
2017-06-22 00:11:02vstinnersetmessages: + msg296611
2017-06-22 00:09:42vstinnersetnosy: + vstinner
messages: + msg296610
2017-06-22 00:00:00vstinnersetpull_requests: + pull_request2371
2013-11-11 11:32:22python-devsetmessages: + msg202614
2013-11-08 14:13:39kristjan.jonssonsetfiles: + lock_tests.patch

messages: + msg202420
2013-11-07 09:18:57kristjan.jonssonsetmessages: + msg202319
2013-11-06 19:35:42akuchlingsetkeywords: - easy
2012-04-11 21:23:29neologixsetmessages: + msg158076
2012-04-11 17:44:28pitrousetmessages: + msg158062
2012-04-11 11:40:33pitrousetmessages: + msg158023
2012-04-11 11:32:43neologixsetmessages: + msg158022
2012-04-11 11:00:01pitrousetmessages: + msg158020
2012-04-11 10:50:35kristjan.jonssonsetmessages: + msg158019
2012-04-11 10:12:19pitrousetmessages: + msg158017
2012-04-11 09:19:08kristjan.jonssonsetmessages: + msg158013
2012-04-10 20:59:47pitrousetmessages: + msg157987
2012-04-10 20:57:09python-devsetnosy: + python-dev
messages: + msg157984
2012-04-10 20:21:11neologixsetmessages: + msg157980
2012-04-10 18:39:35pitrousetfiles: + tpconditiondoc.patch

messages: + msg157976
2012-04-10 18:03:33pitrousetmessages: + msg157973
2012-04-10 18:01:50neologixsetmessages: + msg157972
2012-04-10 17:19:39kristjan.jonssonsetmessages: + msg157968
2012-04-10 10:38:59pitrousetmessages: + msg157944
2012-04-10 10:05:20kristjan.jonssonsetmessages: + msg157942
2012-04-09 15:01:45pitrousetnosy: + neologix
messages: + msg157847
2012-04-09 13:29:01kristjan.jonssonsetfiles: + lock_tests.diff

messages: + msg157844
versions: + Python 3.3, - Python 2.7, Python 3.2
2010-06-29 08:44:31csernazssetnosy: + csernazs
2010-05-24 12:11:45kristjan.jonssonsetkeywords: patch, patch, easy

messages: + msg106360
2010-05-24 11:43:21pitrousetversions: + Python 3.2
nosy: + jyasskin, pitrou

messages: + msg106355

keywords: patch, patch, easy
2010-05-24 09:44:15kristjan.jonssoncreate