classification
Title: thread-safety issue in regrtest.main()
Type: Stage: resolved
Components: Tests Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, chris.jerdonek, flox, pitrou, python-dev, rhettinger
Priority: normal Keywords: easy, patch

Created on 2012-07-10 20:50 by chris.jerdonek, last changed 2012-07-25 22:55 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
listiter_next_atomic.patch amaury.forgeotdarc, 2012-07-11 11:44 review
issue-15320-1.patch chris.jerdonek, 2012-07-11 20:17 review
issue-15320-2.patch chris.jerdonek, 2012-07-12 01:29 review
issue-15320-3.patch chris.jerdonek, 2012-07-13 03:18 review
issue-15320-4.patch chris.jerdonek, 2012-07-15 23:02 review
Messages (21)
msg165203 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-10 20:50
My understanding is that generators are not thread-safe.  For example, see

http://stackoverflow.com/a/1131458/262819

However, regrtest.main() seems to access a generator from multiple threads when run in multiprocess mode:

def work():
    # A worker thread.
    try:
        while True:
            try:
                test, args_tuple = next(pending)
            except StopIteration:
                output.put((None, None, None, None))
                return

http://hg.python.org/cpython/file/2ecdda96f970/Lib/test/regrtest.py#l632
msg165233 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012-07-11 08:09
There is indeed a race condition here.  Fortunately unit tests take much more time than the generator loop.

Is it enough to turn the generator into a fixed list? Or is the "late binding" behavior of args_tuple important? (For example, if the main thread changes the timeout variable, subsequent tests would see the modified value)
msg165250 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-11 11:17
I don't think the late binding is necessary. But it looks like late binding could be preserved simply by constructing args_tuple inside the worker thread instead of in the generator. Really, only "test" needs to be yielded. Nothing else varies in the loop.

Is popping from a list thread safe, or were you thinking of another queue?
msg165253 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012-07-11 11:44
I was about to say "yes, listiter.__next__ is atomic" (it should, really), but that's not true (Hint: there is a Py_DECREF in the code...).

The script below crashes with:
    Fatal Python error: Objects/listobject.c:2774 object at 0x7f66b7a6c600 has negative ref count -1
Attached patch fixes it.


import threading
class C:
    def __del__(self):
        print("DEL")
ITER = iter([C() for x in range(100)])
def f():
    for obj in ITER:
        pass
for x in range(10):
    t = threading.Thread(target=f)
    t.start()
msg165263 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-11 16:13
Amaury's patch looks obviously fine.

As for the original issue: yes, I thought I saw a traceback once due to that. Using list.pop() (or deque.pop()) instead would probably be good enough.
msg165270 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-11 20:17
Attaching a patch for the original issue using deque.
msg165272 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-11 20:34
Hmm, actually the patch is not ok, because of the -F option which uses an infinite iterator.
msg165278 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-12 01:29
Good catch.  Here is a patch that takes --forever mode into account.

I wrote this patch with the assumption that it shouldn't hurt if multiple threads call deque.extend() at the same time.
msg165286 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012-07-12 07:46
> I wrote this patch with the assumption that it shouldn't hurt
> if multiple threads call deque.extend() at the same time.

By looking at the implementation, I found that if multiple threads call dequeue.extend() at the same time, all items will be added, but the order is not deterministic.
It probably does not matter for tests.

In any case, I think we should not rely on this specific implementation.  We should document which functions are safe to use in a multithreading environment.
msg165288 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-12 08:28
That sounds fine.  And thanks for investigating.

By the way, I created issue 15329 earlier today to clarify what guarantees deque provides with respect to multithreading.  For example, the distinction between thread-safe and atomic is not currently mentioned.  As you observed, deque.extend() is not atomic but I'm guessing is probably considered safe.
msg165306 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-07-12 15:03
Why not use locks to guard critical sections rather than relying on implementation details regarding atomicity?
msg165308 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-12 15:10
Yes, that is what I took Amaury's comment to mean.  I started working on a patch that incorporates a lock.
msg165311 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-12 15:37
> Yes, that is what I took Amaury's comment to mean. I started working on a patch that incorporates a lock.

For the sake of clarity, I think Raymond suggests using a lock in regrtest, not in deque.
msg165349 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-13 03:18
Here is another patch -- this one making no implementation assumptions about thread-safety or atomicity.
msg165385 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-13 14:04
I find the patch complicated. If you are using a Lock, surely you don't need a deque, you can keep the original iterator (which is also more readable since it mirrors the semantics quite closely)?
msg165552 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-15 23:02
Thanks a lot for the feedback.  The thinking was to use a stand-alone (even testable) construct with dependencies made explicit.  For example, it wasn't obvious that the current iterator depended on forever, or whether the args_tuple parameter was a necessary part of it.  But I agree it should be simplified.

Here is a simpler patch, keeping iterator semantics.  My preference is to avoid adding lock semantics and additional complexity directly to main(), even though doing so would permit a smaller patch.
msg166426 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-25 21:24
Antoine, does the latest patch look okay to you, or did you want something even more minimal (e.g. by defining and acquiring the lock directly in main)?
msg166427 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-25 21:27
I would have taken the "simpler" route myself (taking the lock around the next() call), that said it looks ok as it is to me. Do other people have an opinion?
msg166432 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-07-25 22:36
Thanks, Antoine. Just a note: with that other implementation I think the call to pending.close() would probably also warrant a lock since it appears close() can't be called either if a generator is already executing (granted that situation would be even rarer since it could arise only during KeyboardInterrupt).
msg166436 - (view) Author: Roundup Robot (python-dev) Date: 2012-07-25 22:54
New changeset d7a64e095930 by Antoine Pitrou in branch '3.2':
Issue #15320: Make iterating the list of tests thread-safe when running tests in multiprocess mode.
http://hg.python.org/cpython/rev/d7a64e095930

New changeset 43ae2a243eca by Antoine Pitrou in branch 'default':
Issue #15320: Make iterating the list of tests thread-safe when running tests in multiprocess mode.
http://hg.python.org/cpython/rev/43ae2a243eca
msg166437 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-25 22:55
Ok, I've committed the patch to 3.2 and 3.3. I don't really want to bother with 2.7 (it's a rare enough issue). Thank you Chris!
History
Date User Action Args
2012-07-26 15:26:36rosslagerwalllinkissue7996 superseder
2012-07-25 22:55:12pitrousetstatus: open -> closed
versions: - Python 2.7
messages: + msg166437

resolution: fixed
stage: patch review -> resolved
2012-07-25 22:54:03python-devsetnosy: + python-dev
messages: + msg166436
2012-07-25 22:36:37chris.jerdoneksetmessages: + msg166432
2012-07-25 21:27:46pitrousetmessages: + msg166427
2012-07-25 21:24:39chris.jerdoneksetmessages: + msg166426
2012-07-15 23:02:28chris.jerdoneksetfiles: + issue-15320-4.patch

messages: + msg165552
2012-07-13 14:04:59pitrousetmessages: + msg165385
2012-07-13 03:18:33chris.jerdoneksetfiles: + issue-15320-3.patch

messages: + msg165349
2012-07-12 15:37:17pitrousetmessages: + msg165311
2012-07-12 15:10:48chris.jerdoneksetmessages: + msg165308
2012-07-12 15:03:34rhettingersetnosy: + rhettinger
messages: + msg165306
2012-07-12 08:28:15chris.jerdoneksetmessages: + msg165288
2012-07-12 07:46:43amaury.forgeotdarcsetmessages: + msg165286
2012-07-12 01:29:24chris.jerdoneksetfiles: + issue-15320-2.patch

messages: + msg165278
2012-07-11 20:34:44pitrousetmessages: + msg165272
2012-07-11 20:17:24chris.jerdoneksetfiles: + issue-15320-1.patch

messages: + msg165270
2012-07-11 16:17:33floxsetnosy: + flox
2012-07-11 16:13:02pitrousetstage: patch review
messages: + msg165263
versions: + Python 2.7, Python 3.2
2012-07-11 11:44:31amaury.forgeotdarcsetfiles: + listiter_next_atomic.patch
keywords: + patch
messages: + msg165253
2012-07-11 11:17:55chris.jerdoneksetmessages: + msg165250
2012-07-11 08:09:53amaury.forgeotdarcsetnosy: + amaury.forgeotdarc, pitrou
messages: + msg165233
2012-07-10 20:50:37chris.jerdonekcreate