classification
Title: bug in PyThread_release_lock()
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: tim.peters Nosy List: gvanrossum, navytux, shihao, tim.peters
Priority: normal Keywords:

Created on 2001-06-16 02:17 by shihao, last changed 2019-09-11 11:34 by navytux. This issue is now closed.

Messages (13)
msg5065 - (view) Author: Shih-Hao Liu (shihao) Date: 2001-06-16 02:17
Mutex should be hold when calling
pthread_cond_signal().  This function should look like:


PyThread_release_lock(PyThread_type_lock lock)
{
        pthread_lock *thelock = (pthread_lock *)lock;
        int status, error = 0;

        dprintf(("PyThread_release_lock(%p) called\n",
lock));

        status = pthread_mutex_lock( &thelock->mut );
        CHECK_STATUS("pthread_mutex_lock[3]");

        thelock->locked = 0;
        /* ***** call pthread_cond_signal before unlock
mutex */
        status = pthread_cond_signal(
&thelock->lock_released );
        CHECK_STATUS("pthread_cond_signal");

        status = pthread_mutex_unlock( &thelock->mut );
        CHECK_STATUS("pthread_mutex_unlock[3]");

        /* wake up someone (anyone, if any) waiting on
the lock */
}
msg5066 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-06-17 21:16
Logged In: YES 
user_id=31435

Why?  It's allowed to signal the condition whether or not 
the mutex is held.  Since changing this can have visible 
effects on thread scheduling, I'm reluctant to change it 
without a good reason.
msg5067 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-06-17 21:51
Logged In: YES 
user_id=6380

Set status to closed -- no need to delete it.
msg5068 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-06-17 22:01
Logged In: YES 
user_id=31435

Ack, did I delete this?!  I sure didn't intend to -- didn't 
even intend to close it.  Reopened pending more info.
msg5069 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-06-17 23:02
Logged In: YES 
user_id=31435

Closing this again, as it appears the original submitter 
deleted it.  shihao, if you want to pursure this, open it 
again.
msg5070 - (view) Author: Shih-Hao Liu (shihao) Date: 2001-06-18 06:01
Logged In: YES 
user_id=246388

I closed it because I thought the thelock->locked variable 
will ensure that the PyThread_release_lock will help to 
protect the condition variable and I was wrong.  The 
linuxthread man page on pthread_cond_signal:

A  condition  variable  must  always  be associated with a
mutex, to avoid the race condition where a thread prepares
to wait on a condition variable and another thread signals
the condition just before the first thread actually  waits
on it.

which means you can't call pthread_cond_signal & 
pthread_cond_wait on the same condition variable at the 
same time.  And using a mutex to protect them is a good 
idea.  Here is how thing might go wrong with current 
implementation:

    thread 1                            thread 2        
                                                        
                            |int PyThread_acquire_lock _
                            |/** assume lock was acquired
                            |  by thread 1, hence locked=0 
                            |  & success would be 0  **/
                            |{                          
                            |  ...                      
                            |  status = pthread_mutex_lo
                            |  CHECK_STATUS("pthread_mut
                            |  success = thelock->locked
                            |  if (success) thelock->loc
                            |  status = pthread_mutex_un
                            |  /** thread 2 suspended **/
void PyThread_release_lock _|                           
{                           |                           
  ...                       |                           
  status = pthread_mutex_loc|                           
  CHECK_STATUS("pthread_mute|                           
                            |                           
  thelock->locked = 0;      |                           
                            |                           
  status = pthread_mutex_unl|                           
/** thread 1 suspend **/    |                           
                            |  CHECK_STATUS("pthread_mut
                            |                           
                            |  if ( !success && waitflag
                            |    /* continue trying unti
                            |                           
                            |    /* mut must be locked b
                            |     * protocol */         
                            |    status = pthread_mutex_
                            |    CHECK_STATUS("pthread_m
                            |    while ( thelock->locked
                            |      status = pthread_cond
                            |/** thread 2 suspended while
                            |    updating shared data **
  CHECK_STATUS("pthread_mute|                           
                            |                           
  /* wake up someone (anyone|                           
  status = pthread_cond_sign|                           
/** thread 1 update shared  |                           
  data and corrupt it. **/  |                           

Not sure what the effect would be.  It's wouldn't be nice 
anyway.



msg5071 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-06-19 20:52
Logged In: YES 
user_id=31435

It appears you're concerned that the signal will "get lost" 
in this scenario.  I agree that it may, but it doesn't 
matter:  thread 2's "while (thelock->locked)" test fails 
because thread 1 already set thelock->locked to 0, so 
thread 2 doesn't execute its pthread_cond_wait, so it 
doesn't matter that nobody is *waiting* to see thread 1's 
pthread_cond_signal.  IOW, the condition thread 1 will try 
to signal has *already* been detected by thread 2, and the 
signal is useless (because redundant) information.  Indeed, 
it's a beauty of the condition protocol that signals can be 
sloppy.

The linuxthread man page is worded strangely here, but note 
that it does not say the mutex *must* be locked.  They 
can't, either, because POSIX doesn't require it; while 
POSIX stds aren't available freely online, some derived 
specs are, and are much clearer about this; e.g., see the 
Single Unix Specification, here:

<http://www.opengroup.org/onlinepubs/7908799/xsh/pthread_con
d_signal.html>

I'll tell you why I don't *want* to change this:  in 
Python's use of the global interpreter lock, it's almost 
always the case that someone is waiting on the lock.  By 
releasing the mutex before signaling, this gives a waiter a 
chance to run immediately upon calling 
pthread_cond_signal.  Else, because pthread_cond_wait 
(which the waiters are executing) has to lock the mutex, if 
the signaler is holding the mutex during the signal, 
pthread_cond_signal can't finish the job -- it was to go 
back to the signaler and let it unlock the mutex first 
before a waiter can proceed.  This makes the region of 
exclusion longer than it needs to be.

So if there's not an actual race problem here (& I still 
don't see one), I don't want to change this.
msg5072 - (view) Author: Shih-Hao Liu (shihao) Date: 2001-06-19 21:41
Logged In: YES 
user_id=246388

Oops.
The problem will be arised if there is a thread 3 called
PyThread_acquire_lock after thread 1 set thelock->locked to
0 and before thread 2 calling pthread_mutex_lock. "while
(thelock->locked)" will success for thread 2 and it will
call pthread_cond_wait and might collide with thread 1's
pthread_cond_signal.
msg5073 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-06-19 22:06
Logged In: YES 
user_id=31435

> The problem will be arised if there is a thread 3
> called PyThread_acquire_lock 

You never mentioned thread 3 again, so I don't know what it 
has to do with this.

> after thread 1 set thelock->locked to 0 and before
> thread 2 calling pthread_mutex_lock. 

I understand both of those.  Are you assuming that, e.g., 
thread 3's PyThread_acquire_lock completes in whole during 
this gap?  I don't know what else you could mean, so let's 
assume that.

> "while (thelock->locked)" will success for thread 2

Sure.

> and it will call pthread_cond_wait

Yup.

> and might collide with thread 1's pthread_cond_signal.

What does "collide" mean to you?  All the pthread_cond_xxx 
functions must be implemented as if atomic, so there's no 
meaningful sense (to me) in which they can collide -- 
unless they're implemented incorrectly.

Assuming they are implemented correctly, it again doesn't 
matter that thread 2 misses thread 1's signal, because 
thread *3* exploited the information thread 1 was going to 
signal, by acquiring the lock.  It's actually good that 
thread 2 isn't bothered with it:  there's no real info in 
the signal anymore (at best, if thread 2 got it, it would 
wake up and go "oops! it's still locked; I'll wait again").

All that matters now is whether thread 2 gets a chance to 
see thread *3*'s signal, at the time thread 3 releases the 
lock.  And it will, because thread 3 can't release the lock 
without acquiring the mutex first, and thread 2 holds the 
mutex at all times except when in its pthread_cond_wait 
call (so thread 3 can't release the lock except when thread 
2 is in pthread_cond_wait).

Note that I'm not at all concerned about "fairness" here, 
only about races.
msg5074 - (view) Author: Shih-Hao Liu (shihao) Date: 2001-06-20 05:14
Logged In: YES 
user_id=246388

> I understand both of those.  Are you assuming that, e.g., 
> thread 3's PyThread_acquire_lock completes in whole 
during 
> this gap?  I don't know what else you could mean, so 
let's 
> assume that.

Yes, I assume thread 3 completed in this gap and it is 
possible.

"Collide" means while thread 2's pthread_cond_wait was 
modifiying the internal data structure, thread 1 calls 
pthread_cond_signal.

The point here is not missing signals, the problem is the 
possiblity that pthread_cond_signal preempt the execution 
of pthread_cond_wait.  I can't find any document says 
pthread_cond_xxx functions must be automic operations.

In pthread_cond_xxx man page, it only mentioned:

pthread_cond_wait atomically unlocks  the  mutex  (as  per
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pthread_unlock_mutex) and waits for the condition variable
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cond to be signaled. The thread execution is suspended and
^^^^^^^^^^^^^^^^^^^
does not consume any CPU time until the condition variable
is signaled. The mutex  must  be  locked  by  the  calling
thread on entrance to pthread_cond_wait.  Before returning
to the calling thread, pthread_cond_wait re-acquires mutex
(as per pthread_lock_mutex).

Unlocking  the mutex and suspending on the condition vari­
able is done  atomically.  Thus,  if  all  threads  always
acquire  the  mutex  before  signaling the condition, this
guarantees that the condition cannot be signaled (and thus
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
ignored) between the time a thread locks the mutex and the
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
time it waits on the condition variable.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I take it that between the time the mutex is locked by 
pthread_mutex_lock and unlocked by pthread_cond_wait, you 
can't call pthread_cond_signal.  I also browse through 
LinuxThread implementation and can't find pthread_cond_xxxx 
is implemented automically.

I found there is another way to fix this without having to 
call the pthread_cond_signal while holding the mutex.  If 
we do:

if (!thelock->locked)
  status = pthread_cond_signal( &thelock->lock_released );

we guarantee that when pthread_cond_signal is called, the 
acquire_lock code will not be in the middle of 
pthread_cond_signal.  


msg5075 - (view) Author: Shih-Hao Liu (shihao) Date: 2001-06-20 07:17
Logged In: YES 
user_id=246388

> I also browse through 
> LinuxThread implementation and can't find 
> pthread_cond_xxxx is implemented automically.

I found they do call __pthread_lock when updating the 
priority queue after take a closer look.
I guess I have to look at elsewhere to figure out why my 
Python script is spinning.
msg5076 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-10-18 16:18
Logged In: YES 
user_id=31435

Deleted this bug report, at Shih-Hao Liu's request.
msg351829 - (view) Author: Kirill Smelkov (navytux) * Date: 2019-09-11 11:34
I still believe there is a race here and it can lead to MEMORY CORRUPTION and DEADLOCK: https://bugs.python.org/issue38106.
History
Date User Action Args
2019-09-11 11:34:32navytuxsetnosy: + navytux
messages: + msg351829
2001-06-16 02:17:17shihaocreate