Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread Safe Py_AddPendingCall #48543

Closed
kristjanvalur mannequin opened this issue Nov 10, 2008 · 32 comments
Closed

Thread Safe Py_AddPendingCall #48543

kristjanvalur mannequin opened this issue Nov 10, 2008 · 32 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@kristjanvalur
Copy link
Mannequin

kristjanvalur mannequin commented Nov 10, 2008

BPO 4293
Nosy @amauryfa, @mdickinson, @pitrou, @kristjanvalur, @tiran
Files
  • pendingalls.patch
  • pendingalls.patch: version 3 of the patch
  • pendingalls.patch: avoid deadlocks on the main thread only.
  • pendingalls.patch: new complete patch, checks for NULL pending_lock
  • pendingalls.patch: new version that doesn't assume signals on the main thread
  • xmlrpc.patch: Some fixes suggested by Amaury
  • testcapi.patch: test cases for Py_AddPendingCall()
  • testcapi.patch: testing the asynchronous notify, with suggested changes
  • pendingalls.patch: full patch including revised documentation
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-01-20.12:02:07.568>
    created_at = <Date 2008-11-10.11:36:34.958>
    labels = ['interpreter-core', 'type-crash']
    title = 'Thread Safe Py_AddPendingCall'
    updated_at = <Date 2009-01-20.12:02:07.567>
    user = 'https://github.com/kristjanvalur'

    bugs.python.org fields:

    activity = <Date 2009-01-20.12:02:07.567>
    actor = 'mark.dickinson'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-01-20.12:02:07.568>
    closer = 'mark.dickinson'
    components = ['Interpreter Core']
    creation = <Date 2008-11-10.11:36:34.958>
    creator = 'kristjan.jonsson'
    dependencies = []
    files = ['11978', '12008', '12031', '12032', '12035', '12594', '12596', '12619', '12624']
    hgrepos = []
    issue_num = 4293
    keywords = ['patch']
    message_count = 32.0
    messages = ['75691', '75692', '75693', '75700', '75782', '75783', '75872', '75963', '75966', '75970', '75977', '79113', '79119', '79128', '79132', '79142', '79147', '79151', '79152', '79241', '79255', '79269', '79270', '79290', '79504', '79507', '79511', '79658', '79975', '79982', '80091', '80250']
    nosy_count = 5.0
    nosy_names = ['amaury.forgeotdarc', 'mark.dickinson', 'pitrou', 'kristjan.jonsson', 'christian.heimes']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue4293'
    versions = ['Python 3.1', 'Python 2.7']

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Nov 10, 2008

    At CCP We have started using the Py_AddPendingCall() mechanism to
    signal python about a completed IO operation.
    However, we noticed that the existing mechanism was hoplelessly un-
    thread safe. This is bad, since on Windows at least, it is very
    convenient to have such callbacks happen on an arbitrary thread from
    the system's thread pool.
    I submit a thread-safe implementation instead to be used if WITH_THREAD
    is defined.
    This allows Py_AddPendingCall() to be called from any thread, from any
    context, even from a PendingCall callback itself.

    @kristjanvalur kristjanvalur mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Nov 10, 2008
    @tiran
    Copy link
    Member

    tiran commented Nov 10, 2008

    Your patch sounds like a useful addition. However I can comment on the
    details because I haven't dealt with Py_AddPendingCall() yet.

    Documentation updates and a unit test (Modules/_testcapimodule.c), that
    shows the bug and verifies your patch, are welcome, too.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Nov 10, 2008

    Good point. I'll make a test using ctypes and _testcapimodule.c to try
    to make it as non-platform-specific as possible.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Nov 10, 2008

    Okay, i add a new patch that includes tests.
    I am unsure where I should add documentation for the Py_AddPendingCall
    () function. Any suggestions?

    @amauryfa
    Copy link
    Member

    Py_AddPendingCall is used inside signal handlers.
    It seems that there is a (tiny) chance that a signal interrupts the
    program inside Py_MakePendingCalls, while the pendinglock is held.
    Py_AddPendingCall would try to acquire this lock again and deadlock.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Nov 12, 2008

    Right.
    Isn't the best way to avoid it to block signals just while we pop the
    queue? I'll see if python has a portable sigaction thing to do that.
    Otherwise we'd have to start to mess with a "volatile busy" flag again
    and possibly introduce tricky race conditions.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Nov 14, 2008

    Here is a revised version.
    Since there is no portable way to block signals, and no way at all on
    windows (apparently) the simplest way is to simply use NOWAIT_LOCK when
    adding a new pending call. While this does not guarantee that we are
    always able to append a call (we may be unlucky in all our N tries and
    there is also no portable way to yield the thread timeslice between
    tries) such are also the semantics of the method. Sometimes the buffer
    may be full, or we fail to get the lock in time, and a -1 is returned.
    It is up to the caller to respond appropriately, perhaps by trying again
    later.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Nov 17, 2008

    Small refinement: Deadlock avoidance is only needed on the main thread
    since we only install signal handlers for the main thread.

    @amauryfa
    Copy link
    Member

    Two comments about the patch:

    • typo in a comment: "...change chat a signal..."

    • PyThread_acquire_lock should not be called with a NULL lock: on win32
      this does not matter (there is a test in thread_nt.h), but Unix
      segfaults in thread_pthread.h if you hit Ctrl-C.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Nov 17, 2008

    I had forgotten that locks aren't initialized until PyEval_InitThreads
    () is called. Now we check for this case. Also reinitialize when fork
    () is called.
    Is there any suggested place where I should add documentation to this?

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Nov 17, 2008

    I turns out that for SIGINT, windows creates a new thread. So, the
    assumption that signals are delivered on the main thread are not valid
    in general. Therefore, we must initialize the lock pending_lock
    independently of PyEval_InitThreads(), and also cannot assume that
    signals are deliverd on the main thread.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Jan 5, 2009

    Ping.
    Amaury, any thoughts?

    @amauryfa
    Copy link
    Member

    amauryfa commented Jan 5, 2009

    • The things_to_do static variable should be declared as "volatile": it
      is read by the main loop without any lock.
      (by the way, could you rename it to something like "pendingcalls_to_do"?)

    • in the old Py_MakePendingCalls function, the "#ifdef WITH_THREAD" part
      is not useful now.

    • the "XXX" comments should probably be rewritten.

    A note to people concerned by performance: the additional lock is used
    only when there is an actual pending call to process. The main loop only
    regularly checks the "things_to_do static" volatile static variable, and
    the patch does not change this.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Jan 5, 2009

    Good points.
    Any suggestion where to document the new functionality?

    @amauryfa
    Copy link
    Member

    amauryfa commented Jan 5, 2009

    Well, I could not find these functions documented anywhere :-(
    If you want to start to write some, a good place could be in
    http://docs.python.org/c-api/init.html (Doc/c-api/init.rst, maybe after
    the PyGILState_ functions)

    At the very least, a short sentence in whatsnew/2.7.rst would be
    appreciated.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Jan 5, 2009

    Ok, here is a revised, patch.
    It now contains rudimentary documentation, which probably needs to be
    proofread by the official documentation lords.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 5, 2009

    Would it be possible to rework your code so as not to define two
    versions of Py_AddPendingCall() and Py_MakePendingCalls(), only putting
    #ifdef WITH_THREAD sections where it's necessary? Or would it make the
    code too quirky?

    Also, it would be nice to have a test for this specific feature (e.g.
    using a dedicated function in the _testcapi module), but perhaps it
    isn't a showstopper either.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Jan 5, 2009

    Ah, I forgot the test code in my patch. I submit a separate patch to
    add that code.
    Also, reworking things to have single function definitions is in my
    opinion too messy. I started doing things that way, but the code
    becomes hard to read and therefore to understand and verify.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 5, 2009

    A few comments about the test:

    • it should be a method of TestCAPI; we try to unittest everywhere,
      although there is some old code which predates that
    • the implementation would be stressed better if each callback was added
      from a separate thread, rather than adding them all at once; it will
      also make the C function in _testcapi simpler (and you can do the sleep
      in the Python code in test_capi.py)
    • if you want the list operation in your callback to be atomic, it
      should append() something to the list rather than increment its first
      element; then you just have to test for the len() of the list
    • in _pending_callback(), you must use Py_XDECREF(r), not Py_DECREF,
      since r can be NULL

    I will look at the ceval part of the patch later :)

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Jan 6, 2009

    Thanks.
    What do you mean by making the test a method of TestCAPI? I have found
    no such class. Are you talking about the python part or the c part?

    Also, atomicity in the callback is not required, as the callback is
    never interrupted implicitly. In fact, that would make it quite
    useless. I'll add that to the documentation.

    As for your other suggestions, they're noted, I'll try having more
    threads.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Jan 6, 2009

    Ok, I have addressed the issues raised about testcapi. A new file has
    been submitted.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 6, 2009

    Ok, the test patch is fine!

    Now looking at the main patch, some comments:

    • you removed "volatile" from pendingfirst and pendinglast, is it really
      safe?
    • what if pending_lock is not initialized when Py_AddPendingCall is
      called? I know it's unlikely, but imagine someone calling that function
      just after the interpreter has been initialized
    • the documentation should mention that the GIL must not be held when
      Py_AddPendingCall is called, otherwise there can be a deadlock (is it a
      change from previous versions by the way?)
    • you should check the return value of PyThread_allocate_lock() for NULL
    • is your implementation reentrant? that is, registering a callback from
      a callback. There is no need for it to be, but if it aims to be, then it
      should perhaps be documented and tested :)

    @pitrou
    Copy link
    Member

    pitrou commented Jan 6, 2009

    Sorry, forget what I said about the GIL, it looks fine. Perhaps some
    comments should be added to clarify that matter, however.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Jan 6, 2009

    1. volatile is not required when access is guarded by locks.
    2. pendingcalls_to_do is initialized to 1, so the very first thing that
      the interpreter does (after executing one bytecode, I init _PyTicker to
      0 now) is to initialize this lock. Creating a lock really must be done
      by one well known startup thread only. There is a lot of functions in
      the python API that don't work until after python initialization
      (without that being explicitly documented), Py_AddPendingCall() is one
      of them. Btw, there is no generic "init" function for the ceval.c
      module. And PyEval_InitThreads isn't really suitable since we must
      have the lock, even if we haven't initialized the threads, to guard us
      against reentrancy during signal delivery. For extra safety, I have
      added a NULL test in Py_AddPendingCall().
      3)Py_AddPendingCall() can be called with our without the GIL.
      4)I've added a check in Py_MakePendingCalls(). In PyEval_ReInitThreads
      there is no way to do anything sensible in the case of error.
      5)Yes it is reentrant. The pending_lock is only held for a short while
      while we pop a callback off the queue. However, because even during
      that short while a signal can be delivered which causes
      Py_AddPendingCall() to be called, the latter function fails if it
      doesn't acquire the lock in a reasonable amount of time. This would
      cause a signal to be lost, and not a deadlock. This hasn't changed
      from the old implementation.

    I have also added a flag to make sure that we don't execute pending
    calls recursively, (async notifications is so much better) and
    explained it better in the docs.

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Jan 9, 2009

    Checked in as revision: 68460

    @kristjanvalur kristjanvalur mannequin closed this as completed Jan 9, 2009
    @pitrou
    Copy link
    Member

    pitrou commented Jan 9, 2009

    Kristján Valur Jónsson <kristjan@ccpgames.com> added the comment:

    Checked in as revision: 68460

    Looks like you forgot the unit tests!

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Jan 9, 2009

    I indeed forgot the unittests and docs.
    They are now in, in revision: 68461

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Jan 12, 2009

    Added MISC/News entry in revision: 68545

    @mdickinson
    Copy link
    Member

    Kristján,

    The r68461 checkin seems to be causing a number of the buildbots to fail,
    typically with output like:

    test_capi
    test test_capi failed -- Traceback (most recent call last):
      File "/home/pybot/buildarea/trunk.klose-debian-ppc/build/Lib/test/test_capi.py", line 57, in 
    test_pendingcalls_threaded
        self.pendingcalls_wait(l, n)
      File "/home/pybot/buildarea/trunk.klose-debian-ppc/build/Lib/test/test_capi.py", line 42, in 
    pendingcalls_wait
        "timeout waiting for %i callbacks, got %i"%(n, len(l)))
    AssertionError: timeout waiting for 32 callbacks, got 23

    Please could you look into this?

    Also, I don't understand the code:

    for i in xrange(1000):
        a = i*i

    in pendingcalls_wait(), in test_capi.py. Whatever you're
    trying to do here, surely there's a better way?

    @mdickinson mdickinson reopened this Jan 16, 2009
    @mdickinson
    Copy link
    Member

    How about using a timer instead of the 'count += 1' loop: after starting
    the 32 self.pendingcalls_submit threads, set up a threading.Event and
    start yet another thread that simply does a time.sleep(5.0) (or whatever)
    and then sets that event.

    Then your waiting loop could be something like:

    while not self.my_event.is_set():
        for i in range(1000):
            a = i*i
    self.assertEqual(len(l), n)

    There's probably a better way, but that's the best I can come up with this
    late on a Friday night...

    @kristjanvalur
    Copy link
    Mannequin Author

    kristjanvalur mannequin commented Jan 18, 2009

    checked in r68722, let's hope this fixes things.

    @mdickinson
    Copy link
    Member

    That seems to have done the trick. Thanks!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants