msg106815 - (view) |
Author: Christian Schubert (apexo) * |
Date: 2010-05-31 22:20 |
invoking select.poll.poll() concurrently from multiple threads frequently yields garbage in one thread:
while poll_poll in thread 1 is parsing its result, another thread 2 calling poll may overwrite revents; assuming poll_result was 1 in thread 1 and thread 2 managed to clear all revents before thread 1 started scanning ufds, thread 1 would iterate straight through all ufds, past its bounds (no bound checks there), and return the first out-of-bounds entry that happens to have revents != 0
this issue needs at least documentation (although bounds-checking to prevent garbage in the result wouldn't hurt)
also, since there doesn't seem to be any locking w/ regards to ufds, it might be possible to corrupt python's heap, by concurrently invoking poll_register and poll_poll. poll_register could move the ufds array to another location while resizing it and poll_poll would subsequently overwrite memory that is not allocated anymore or allocated by someone else (did not test that)
python 2.5.5
|
msg106816 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-05-31 22:24 |
Do you have a script that reproduces it?
|
msg106827 - (view) |
Author: Christian Schubert (apexo) * |
Date: 2010-06-01 06:35 |
okay, I've managed to put together a script that fairly consistently reproduces the problem on my (core2 duo 2,something GHz) machine
some parameters (sleep times) are randomized in each iteration, the script runs for 30 iterations, each for 1 second, and counts the number of events where the fd looks bogus (max 1 per iteration); some other events (fd correct, revents bogus are also reported, but not counted)
example output:
poll returned [(-3, 65535)], we were looking for fd 3
poll returned [(-3, 65535)], we were looking for fd 3
poll returned [(-3, 65535)], we were looking for fd 3
poll returned [(-3, 65535)], we were looking for fd 3
poll returned [(-3, 65535)], we were looking for fd 3
poll returned [(-1, 65535)], we were looking for fd 3
poll returned [(-1, 65535)], we were looking for fd 3
poll returned [(-1627358347, 60497)], we were looking for fd 3
should not get here; poll returned [(3, 0)]
poll returned [(-3, 65535)], we were looking for fd 3
9 events in 30 iterations
|
msg106829 - (view) |
Author: Christian Schubert (apexo) * |
Date: 2010-06-01 07:09 |
the other issue (poll_register freeing data structures that poll_poll currently uses) can be reproduced by the attached script (at least I can)
depending on the parameters/circumstances it either segfaults or prints garbage results from poll, e.g. [(1684955474, 28192)]; the expected result would be [(3, 1)]
|
msg107863 - (view) |
Author: Christian Schubert (apexo) * |
Date: 2010-06-15 10:21 |
added a patch which fixes both issues
before releasing the GIL we take a copy of the ufds pointer and its len, erasing the ufds pointer in the poll object (to make sure nobody else fiddles with it); when we're done we either but it back into the object (when it's still NULL) or we free it
the update logic is modified as well, since the uptodate flag is not sufficient anymore, we now count up a tag for each modification of the poll dictionary and remember that tag together with the ufds pointer
the result is:
- for the single-threaded case we do little extra work (moving ufds from/to poll object)
- correct for the multi-threaded case, with slightly higher overhead (one additional call to each of malloc, update_ufd_array, free), probably worse than having one poll object per thread, but not worse than allocating a new poll object each time we want to poll
there still is potential for incorrect poll results (but not for memory corruption): when poll_register/poll_unregister are called exactly 2**32 times (or multiples thereof) while poll_poll is running, it will happily put back its outdated ufds pointer into the poll object when its done, this could be alleviated by changing tag to long long ... which is unlikely to wrap around anytime soon.
|
msg175387 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-11-11 19:48 |
I was unable to apply the patch automatically, so I had to do it manually. Here is an updated patch for review. I did not consider it in detail yet, but it seems to correct these errors.
|
msg175389 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2012-11-11 19:55 |
i'm looking at getting this in.
|
msg175390 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2012-11-11 21:21 |
Christian Schubert (apexo) - Would you please submit a PSF contributor agreement form?
http://www.python.org/psf/contrib/
http://www.python.org/psf/contrib/contrib-form-python/
thanks!
|
msg175394 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-11-11 21:50 |
In general the patch looks good to me. I only get rid of non-needed macros.
|
msg175395 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2012-11-11 21:51 |
here's an updated patch.
it strikes me that this should not be a very common problem. how many applications are going to share the same poll object _across_ multiple threads?
if they do and the file descriptor they'll be spending a lot of time mallocing and freeing new pollfd ufds arrays in each thread as a result as they all fight for ownership of the pollObject's ufds.
|
msg175397 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2012-11-11 21:54 |
our patches are similar. i updated it to use long long and Py_ssize_t and Py_CLEAR and Py_RETURN_NONE in a few places and added comments.
getting rid of the CLEAR_UFDS macro as you did is a good idea.
|
msg175400 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-11-11 22:10 |
> it strikes me that this should not be a very common problem. how many
> applications are going to share the same poll object _across_ multiple
> threads?
Indeed that doesn't sound very likely. How about raising an error on concurrent modification, instead of trying to make it thread-safe?
|
msg175409 - (view) |
Author: Christian Schubert (apexo) * |
Date: 2012-11-11 23:44 |
> How about raising an error on concurrent modification, instead of trying to make it thread-safe?
That's totally fine with me.
|
msg175429 - (view) |
Author: Christian Schubert (apexo) * |
Date: 2012-11-12 09:22 |
new proposed fix: forbid concurrent poll() invocation
|
msg175431 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-11-12 09:57 |
The patch LGTM. I doubt about the exception type. May be RuntimeError is more appropriate?
|
msg175432 - (view) |
Author: Christian Schubert (apexo) * |
Date: 2012-11-12 10:02 |
> I doubt about the exception type. May be RuntimeError is more appropriate?
mea culpa, just copy&pasted without actually looking; fixed in v3
|
msg175501 - (view) |
Author: Christian Schubert (apexo) * |
Date: 2012-11-13 15:59 |
> Would you please submit a PSF contributor agreement form?
FYI: did that yesterday 9:43 UTC, no reaction (yet)
|
msg175525 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-11-13 20:45 |
> > Would you please submit a PSF contributor agreement form?
>
> FYI: did that yesterday 9:43 UTC, no reaction (yet)
Don't worry, it can take some time to process. We can still apply your
patch, though, since you said you sent a contributor agreement.
|
msg176894 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-12-04 10:44 |
As I see, Christian's contributor agreement already confirmed. Let's go ahead.
|
msg176895 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-12-04 10:48 |
Christian, it would be nice if you write the tests.
|
msg176969 - (view) |
Author: Christian Schubert (apexo) * |
Date: 2012-12-05 07:52 |
What's a proper test for this? Testing, that the (now expected) exception occurs when invoking poll concurrently? Or rather, that the race condition does not occur? Last time I checked, pypy handled the concurrent poll invocation well, so it would fail both tests.
|
msg176998 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-12-05 19:19 |
Test should check changed behavior. I.e. it should test that calling poll concurrently raises an exception.
|
msg179657 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-01-11 09:00 |
This patch should be updated to also fix epoll().
Also, is it right to raise an exception in case of concurrent invocation?
Here's a simple script that crashes systematically on my Linux box.
|
msg179659 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-01-11 09:19 |
> Also, is it right to raise an exception in case of concurrent invocation?
It is right for poll() because it was not concurrently usable in previous versions in any case. For epoll() it is an another issue.
|
msg180696 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-01-26 18:15 |
Here is a test made from Charles-François's crasher. Let's go.
|
msg181822 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-02-10 17:37 |
Ping.
|
msg181823 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-02-10 17:44 |
Patches look good to me.
|
msg186736 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-04-13 15:59 |
Gregory?
|
msg194334 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-08-04 08:37 |
Gregory, could you commit the patch or allow me to do this?
|
msg195705 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-08-20 17:59 |
New changeset 072ba5df77e4 by Serhiy Storchaka in branch '3.3':
Issue #8865: Concurrent invocation of select.poll.poll() now raises a
http://hg.python.org/cpython/rev/072ba5df77e4
New changeset 4543408e2ba6 by Serhiy Storchaka in branch 'default':
Issue #8865: Concurrent invocation of select.poll.poll() now raises a
http://hg.python.org/cpython/rev/4543408e2ba6
New changeset a4091c1de27a by Serhiy Storchaka in branch '2.7':
Issue #8865: Concurrent invocation of select.poll.poll() now raises a
http://hg.python.org/cpython/rev/a4091c1de27a
|
msg195707 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-08-20 18:23 |
Thank you for the report and the patch Christian.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:01 | admin | set | github: 53111 |
2013-08-20 18:23:29 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages:
+ msg195707
stage: commit review -> resolved |
2013-08-20 17:59:34 | python-dev | set | nosy:
+ python-dev messages:
+ msg195705
|
2013-08-20 14:47:22 | serhiy.storchaka | set | assignee: gregory.p.smith -> serhiy.storchaka versions:
- Python 3.2 |
2013-08-04 17:25:37 | vstinner | set | nosy:
+ vstinner
|
2013-08-04 08:37:10 | serhiy.storchaka | set | messages:
+ msg194334 |
2013-04-13 15:59:02 | serhiy.storchaka | set | messages:
+ msg186736 |
2013-02-10 17:44:54 | pitrou | set | messages:
+ msg181823 |
2013-02-10 17:37:40 | serhiy.storchaka | set | messages:
+ msg181822 |
2013-01-26 18:17:05 | serhiy.storchaka | set | stage: test needed -> commit review |
2013-01-26 18:15:19 | serhiy.storchaka | set | files:
+ issue8865_test.patch
messages:
+ msg180696 |
2013-01-11 09:19:16 | serhiy.storchaka | set | messages:
+ msg179659 |
2013-01-11 09:00:32 | neologix | set | files:
+ test.py nosy:
+ neologix messages:
+ msg179657
|
2013-01-11 08:58:36 | neologix | unlink | issue16929 dependencies |
2013-01-11 08:58:36 | neologix | link | issue16929 superseder |
2013-01-11 08:52:14 | serhiy.storchaka | link | issue16929 dependencies |
2012-12-05 19:19:43 | serhiy.storchaka | set | messages:
+ msg176998 |
2012-12-05 10:53:52 | asvetlov | set | nosy:
+ asvetlov
|
2012-12-05 07:52:14 | apexo | set | messages:
+ msg176969 |
2012-12-04 10:48:04 | serhiy.storchaka | set | messages:
+ msg176895 |
2012-12-04 10:44:30 | serhiy.storchaka | set | messages:
+ msg176894 |
2012-11-13 20:45:30 | pitrou | set | messages:
+ msg175525 |
2012-11-13 15:59:15 | apexo | set | messages:
+ msg175501 |
2012-11-12 10:02:00 | apexo | set | files:
+ issue8865_v3.diff
messages:
+ msg175432 |
2012-11-12 09:57:11 | serhiy.storchaka | set | messages:
+ msg175431 |
2012-11-12 09:22:10 | apexo | set | files:
+ issue8865_v2.diff
messages:
+ msg175429 |
2012-11-11 23:44:34 | apexo | set | messages:
+ msg175409 |
2012-11-11 22:10:08 | pitrou | set | messages:
+ msg175400 |
2012-11-11 21:54:08 | gregory.p.smith | set | messages:
+ msg175397 |
2012-11-11 21:51:05 | gregory.p.smith | set | files:
+ issue8865-poll-multithread-gps01.diff
messages:
+ msg175395 |
2012-11-11 21:50:31 | serhiy.storchaka | set | files:
+ select_concurrent_poll.patch
messages:
+ msg175394 stage: test needed |
2012-11-11 21:21:22 | gregory.p.smith | set | messages:
+ msg175390 |
2012-11-11 19:55:29 | gregory.p.smith | set | assignee: gregory.p.smith messages:
+ msg175389 |
2012-11-11 19:48:10 | serhiy.storchaka | set | files:
+ select_concurrent_poll.patch type: behavior -> crash messages:
+ msg175387
versions:
+ Python 3.3, Python 3.4, - Python 2.6, Python 3.1 |
2012-11-11 16:52:34 | pitrou | set | nosy:
+ serhiy.storchaka
|
2012-11-11 04:08:53 | gregory.p.smith | set | assignee: gregory.p.smith -> (no value) |
2012-06-07 20:38:40 | gregory.p.smith | set | assignee: gregory.p.smith
nosy:
+ gregory.p.smith |
2010-06-15 10:31:07 | eric.araujo | set | nosy:
- docs@python
|
2010-06-15 10:22:26 | pitrou | set | nosy:
+ exarkun
|
2010-06-15 10:21:28 | apexo | set | files:
+ select.patch keywords:
+ patch messages:
+ msg107863
|
2010-06-01 07:09:26 | apexo | set | files:
+ concurrent_poll_and_register.py
messages:
+ msg106829 |
2010-06-01 06:35:20 | apexo | set | files:
+ select_threads.py
messages:
+ msg106827 |
2010-05-31 22:24:08 | pitrou | set | versions:
+ Python 2.6, Python 3.1, Python 2.7, Python 3.2, - Python 2.5 nosy:
+ pitrou
messages:
+ msg106816
assignee: docs@python -> (no value) components:
- Documentation |
2010-05-31 22:20:13 | apexo | create | |