classification
Title: Thread Safe Py_AddPendingCall
Type: crash Stage: patch review
Components: Interpreter Core Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, christian.heimes, kristjan.jonsson, mark.dickinson, pitrou
Priority: normal Keywords: patch

Created on 2008-11-10 11:36 by kristjan.jonsson, last changed 2009-01-20 12:02 by mark.dickinson. This issue is now closed.

Files
File name Uploaded Description Edit
pendingalls.patch kristjan.jonsson, 2008-11-10 14:05
pendingalls.patch kristjan.jonsson, 2008-11-14 16:59 version 3 of the patch
pendingalls.patch kristjan.jonsson, 2008-11-17 14:52 avoid deadlocks on the main thread only.
pendingalls.patch kristjan.jonsson, 2008-11-17 16:19 new complete patch, checks for NULL pending_lock
pendingalls.patch kristjan.jonsson, 2008-11-17 19:05 new version that doesn't assume signals on the main thread
xmlrpc.patch kristjan.jonsson, 2009-01-05 13:44 Some fixes suggested by Amaury
testcapi.patch kristjan.jonsson, 2009-01-05 14:21 test cases for Py_AddPendingCall()
testcapi.patch kristjan.jonsson, 2009-01-06 13:11 testing the asynchronous notify, with suggested changes
pendingalls.patch kristjan.jonsson, 2009-01-06 21:42 full patch including revised documentation
Messages (32)
msg75691 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-10 11:36
At CCP We have started using the Py_AddPendingCall() mechanism to 
signal python about a completed IO operation.
However, we noticed that the existing mechanism was hoplelessly un-
thread safe.  This is bad, since on Windows at least, it is very 
convenient to have such callbacks happen on an arbitrary thread from 
the system's thread pool.
I submit a thread-safe implementation instead to be used if WITH_THREAD 
is defined.
This allows Py_AddPendingCall() to be called from any thread, from any 
context, even from a PendingCall callback itself.
msg75692 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-11-10 11:53
Your patch sounds like a useful addition. However I can comment on the
details because I haven't dealt with Py_AddPendingCall() yet.

Documentation updates and a unit test (Modules/_testcapimodule.c), that
shows the bug and verifies your patch, are welcome, too.
msg75693 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-10 11:59
Good point.  I'll make a test using ctypes and _testcapimodule.c to try 
to make it as non-platform-specific as possible.
msg75700 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-10 14:05
Okay, i add a new patch that includes tests.
I am unsure where I should add documentation for the Py_AddPendingCall
() function.  Any suggestions?
msg75782 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-11-12 10:09
Py_AddPendingCall is used inside signal handlers.
It seems that there is a (tiny) chance that a signal interrupts the
program inside Py_MakePendingCalls, while the pendinglock is held.
Py_AddPendingCall would try to acquire this lock again and deadlock.
msg75783 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-12 10:13
Right.
Isn't the best way to avoid it to block signals just while we pop the 
queue?  I'll see if python has a portable sigaction thing to do that.
Otherwise we'd have to start to mess with a "volatile busy" flag again 
and possibly introduce tricky race conditions.
msg75872 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-14 16:59
Here is a revised version.
Since there is no portable way to block signals, and no way at all on 
windows (apparently) the simplest way is to simply use NOWAIT_LOCK when 
adding a new pending call.  While this does not guarantee that we are 
always able to append a call (we may be unlucky in all our N tries and 
there is also no portable way to yield the thread timeslice between 
tries) such are also the semantics of the method.  Sometimes the buffer 
may be full, or we fail to get the lock in time, and a -1 is returned.  
It is up to the caller to respond appropriately, perhaps by trying again 
later.
msg75963 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-17 14:52
Small refinement:  Deadlock avoidance is only needed on the main thread 
since we only install signal handlers for the main thread.
msg75966 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-11-17 15:54
Two comments about the patch:

- typo in a comment: "...change chat a signal..."

- PyThread_acquire_lock should not be called with a NULL lock: on win32
this does not matter (there is a test in thread_nt.h), but Unix
segfaults in thread_pthread.h if you hit Ctrl-C.
msg75970 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-17 16:19
I had forgotten that locks aren't initialized until PyEval_InitThreads
() is called.  Now we check for this case.  Also reinitialize when fork
() is called.
Is there any suggested place where I should add documentation to this?
msg75977 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2008-11-17 19:05
I turns out that for SIGINT, windows creates a new thread.  So, the 
assumption that signals are delivered on the main thread are not valid 
in general.  Therefore, we must initialize the lock pending_lock 
independently of PyEval_InitThreads(), and also cannot assume that 
signals are deliverd on the main thread.
msg79113 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-05 09:27
Ping.
Amaury, any thoughts?
msg79119 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-01-05 10:29
- The things_to_do static variable should be declared as "volatile": it
is read by the main loop without any lock.
(by the way, could you rename it to something like "pendingcalls_to_do"?)

- in the old Py_MakePendingCalls function, the "#ifdef WITH_THREAD" part
is not useful now. 

- the "XXX" comments should probably be rewritten.

A note to people concerned by performance: the additional lock is used
only when there is an actual pending call to process. The main loop only
regularly checks the "things_to_do static" volatile static variable, and
the patch does not change this.
msg79128 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-05 11:19
Good points.
Any suggestion where to document the new functionality?
msg79132 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-01-05 12:11
Well, I could not find these functions documented anywhere :-(
If you want to start to write some, a good place could be in
http://docs.python.org/c-api/init.html (Doc/c-api/init.rst, maybe after
the PyGILState_ functions)

At the very least, a short sentence in whatsnew/2.7.rst would be
appreciated.
msg79142 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-05 13:44
Ok, here is a revised, patch.
It now contains rudimentary documentation, which probably needs to be 
proofread by the official documentation lords.
msg79147 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-05 13:59
Would it be possible to rework your code so as not to define two
versions of Py_AddPendingCall() and Py_MakePendingCalls(), only putting
`#ifdef WITH_THREAD` sections where it's necessary? Or would it make the
code too quirky?

Also, it would be nice to have a test for this specific feature (e.g.
using a dedicated function in the _testcapi module), but perhaps it
isn't a showstopper either.
msg79151 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-05 14:21
Ah, I forgot the test code in my patch.  I submit a separate patch to 
add that code.
Also, reworking things to have single function definitions is in my 
opinion too messy.  I started doing things that way, but the code 
becomes hard to read and therefore to understand and verify.
msg79152 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-05 14:56
A few comments about the test:
- it should be a method of TestCAPI; we try to unittest everywhere,
although there is some old code which predates that
- the implementation would be stressed better if each callback was added
from a separate thread, rather than adding them all at once; it will
also make the C function in _testcapi simpler (and you can do the sleep
in the Python code in test_capi.py)
- if you want the list operation in your callback to be atomic, it
should append() something to the list rather than increment its first
element; then you just have to test for the len() of the list
- in _pending_callback(), you must use Py_XDECREF(r), not Py_DECREF,
since r can be NULL

I will look at the ceval part of the patch later :)
msg79241 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-06 09:31
Thanks.
What do you mean by making the test a method of TestCAPI?  I have found 
no such class.  Are you talking about the python part or the c part?

Also, atomicity in the callback is not required, as the callback is 
never interrupted implicitly.  In fact, that would make it quite 
useless.  I'll add that to the documentation.

As for your other suggestions, they're noted, I'll try having more 
threads.
msg79255 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-06 13:11
Ok, I have addressed the issues raised about testcapi.  A new file has 
been submitted.
msg79269 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-06 16:06
Ok, the test patch is fine!

Now looking at the main patch, some comments:
- you removed "volatile" from pendingfirst and pendinglast, is it really
safe?
- what if pending_lock is not initialized when Py_AddPendingCall is
called? I know it's unlikely, but imagine someone calling that function
just after the interpreter has been initialized
- the documentation should mention that the GIL must not be held when
Py_AddPendingCall is called, otherwise there can be a deadlock (is it a
change from previous versions by the way?)
- you should check the return value of PyThread_allocate_lock() for NULL
- is your implementation reentrant? that is, registering a callback from
a callback. There is no need for it to be, but if it aims to be, then it
should perhaps be documented and tested :)
msg79270 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-06 16:12
Sorry, forget what I said about the GIL, it looks fine. Perhaps some
comments should be added to clarify that matter, however.
msg79290 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-06 21:42
1) volatile is not required when access is guarded by locks.
2) pendingcalls_to_do is initialized to 1, so the very first thing that 
the interpreter does (after executing one bytecode, I init _PyTicker to 
0 now) is to initialize this lock.  Creating a lock really must be done 
by one well known startup thread only.  There is a lot of functions in 
the python API that don't work until after python initialization 
(without that being explicitly documented), Py_AddPendingCall() is one 
of them.  Btw, there is no generic "init" function for the ceval.c 
module.  And PyEval_InitThreads isn't really suitable since we must 
have the lock, even if we haven't initialized the threads, to guard us 
against reentrancy during signal delivery.  For extra safety, I have 
added a NULL test in Py_AddPendingCall().
3)Py_AddPendingCall() can be called with our without the GIL. 
4)I've added a check in Py_MakePendingCalls().  In PyEval_ReInitThreads 
there is no way to do anything sensible in the case of error.
5)Yes it is reentrant.  The pending_lock is only held for a short while 
while we pop a callback off the queue.  However, because even during 
that short while a signal can be delivered which causes 
Py_AddPendingCall() to be called, the latter function fails if it 
doesn't acquire the lock in a reasonable amount of time.  This would 
cause a signal to be lost, and not a deadlock.  This hasn't changed 
from the old implementation.

I have also added a flag to make sure that we don't execute pending 
calls recursively, (async notifications is so much better) and 
explained it better in the docs.
msg79504 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-09 20:31
Checked in as revision: 68460
msg79507 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-09 21:12
> Kristján Valur Jónsson <kristjan@ccpgames.com> added the comment:
> 
> Checked in as revision: 68460

Looks like you forgot the unit tests!
msg79511 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-09 21:36
I indeed forgot the unittests and docs.
They are now in, in revision: 68461
msg79658 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-12 09:24
Added MISC/News entry in revision: 68545
msg79975 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-01-16 20:44
Kristján,

The r68461 checkin seems to be causing a number of the buildbots to fail,
typically with output like:

test_capi
test test_capi failed -- Traceback (most recent call last):
  File "/home/pybot/buildarea/trunk.klose-debian-ppc/build/Lib/test/test_capi.py", line 57, in 
test_pendingcalls_threaded
    self.pendingcalls_wait(l, n)
  File "/home/pybot/buildarea/trunk.klose-debian-ppc/build/Lib/test/test_capi.py", line 42, in 
pendingcalls_wait
    "timeout waiting for %i callbacks, got %i"%(n, len(l)))
AssertionError: timeout waiting for 32 callbacks, got 23

Please could you look into this?


Also, I don't understand the code:

for i in xrange(1000):
    a = i*i

in pendingcalls_wait(), in test_capi.py.  Whatever you're
trying to do here, surely there's a better way?
msg79982 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-01-16 22:31
How about using a timer instead of the 'count += 1' loop:  after starting 
the 32 self.pendingcalls_submit threads, set up a threading.Event and 
start yet another thread that simply does a time.sleep(5.0) (or whatever) 
and then sets that event.

Then your waiting loop could be something like:

while not self.my_event.is_set():
    for i in range(1000):
        a = i*i
self.assertEqual(len(l), n)

There's probably a better way, but that's the best I can come up with this 
late on a Friday night...
msg80091 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-18 10:59
checked in r68722, let's hope this fixes things.
msg80250 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-01-20 12:02
That seems to have done the trick.  Thanks!
History
Date User Action Args
2009-01-20 12:02:07mark.dickinsonsetstatus: open -> closed
keywords: patch, patch
messages: + msg80250
2009-01-18 10:59:18kristjan.jonssonsetkeywords: patch, patch
messages: + msg80091
2009-01-16 22:31:52mark.dickinsonsetkeywords: patch, patch
messages: + msg79982
2009-01-16 20:44:07mark.dickinsonsetstatus: closed -> open
keywords: patch, patch
messages: + msg79975
nosy: + mark.dickinson
2009-01-12 09:24:33kristjan.jonssonsetkeywords: patch, patch
messages: + msg79658
2009-01-09 21:36:23kristjan.jonssonsetkeywords: patch, patch
messages: + msg79511
2009-01-09 21:12:19pitrousetmessages: + msg79507
2009-01-09 20:31:51kristjan.jonssonsetstatus: open -> closed
keywords: patch, patch
resolution: accepted
messages: + msg79504
2009-01-06 21:42:21kristjan.jonssonsetkeywords: patch, patch
files: + pendingalls.patch
messages: + msg79290
2009-01-06 16:12:48pitrousetkeywords: patch, patch
messages: + msg79270
2009-01-06 16:06:02pitrousetkeywords: patch, patch
messages: + msg79269
2009-01-06 13:11:16kristjan.jonssonsetkeywords: patch, patch
files: + testcapi.patch
messages: + msg79255
2009-01-06 09:31:29kristjan.jonssonsetkeywords: patch, patch
messages: + msg79241
2009-01-05 14:56:17pitrousetkeywords: patch, patch
messages: + msg79152
2009-01-05 14:21:10kristjan.jonssonsetkeywords: patch, patch
files: + testcapi.patch
messages: + msg79151
2009-01-05 13:59:08pitrousetkeywords: patch, patch
nosy: + pitrou
messages: + msg79147
2009-01-05 13:44:06kristjan.jonssonsetkeywords: patch, patch
files: + xmlrpc.patch
messages: + msg79142
2009-01-05 12:11:44amaury.forgeotdarcsetkeywords: patch, patch
messages: + msg79132
2009-01-05 11:19:51kristjan.jonssonsetkeywords: patch, patch
messages: + msg79128
2009-01-05 10:29:08amaury.forgeotdarcsetkeywords: patch, patch
messages: + msg79119
2009-01-05 09:27:14kristjan.jonssonsetkeywords: patch, patch
messages: + msg79113
2008-11-17 19:05:34kristjan.jonssonsetkeywords: patch, patch
files: + pendingalls.patch
messages: + msg75977
2008-11-17 16:19:31kristjan.jonssonsetkeywords: patch, patch
files: + pendingalls.patch
messages: + msg75970
2008-11-17 15:54:35amaury.forgeotdarcsetkeywords: patch, patch
messages: + msg75966
2008-11-17 14:52:39kristjan.jonssonsetkeywords: patch, patch
files: + pendingalls.patch
messages: + msg75963
2008-11-14 16:59:49kristjan.jonssonsetkeywords: patch, patch
files: + pendingalls.patch
messages: + msg75872
2008-11-12 10:13:10kristjan.jonssonsetkeywords: patch, patch
messages: + msg75783
2008-11-12 10:09:05amaury.forgeotdarcsetkeywords: patch, patch
nosy: + amaury.forgeotdarc
messages: + msg75782
2008-11-10 14:05:18kristjan.jonssonsetfiles: - pendingalls.patch
2008-11-10 14:05:08kristjan.jonssonsetkeywords: patch, patch
files: + pendingalls.patch
messages: + msg75700
2008-11-10 11:59:25kristjan.jonssonsetkeywords: patch, patch
messages: + msg75693
2008-11-10 11:53:27christian.heimessetkeywords: - needs review
nosy: + christian.heimes
stage: patch review
messages: + msg75692
versions: + Python 3.1
2008-11-10 11:36:34kristjan.jonssoncreate