classification
Title: select.poll is not thread safe
Type: crash Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: apexo, asvetlov, exarkun, gregory.p.smith, haypo, neologix, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2010-05-31 22:20 by apexo, last changed 2013-08-20 18:23 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
select_threads.py apexo, 2010-06-01 06:35 python2 script that demonstrates the problem
concurrent_poll_and_register.py apexo, 2010-06-01 07:09 python2 script to demonstrate memory corruption issue
select.patch apexo, 2010-06-15 10:21 a patch
select_concurrent_poll.patch serhiy.storchaka, 2012-11-11 19:48 Updated to 2.7 for review patch review
select_concurrent_poll.patch serhiy.storchaka, 2012-11-11 21:50 review
issue8865-poll-multithread-gps01.diff gregory.p.smith, 2012-11-11 21:51 updated, against 3.2 review
issue8865_v2.diff apexo, 2012-11-12 09:22 forbid concurrent poll() invocation review
issue8865_v3.diff apexo, 2012-11-12 10:02 RuntimeError instead of TypeError review
test.py neologix, 2013-01-11 09:00 crasher
issue8865_test.patch serhiy.storchaka, 2013-01-26 18:15 review
Messages (31)
msg106815 - (view) Author: Christian Schubert (apexo) * Date: 2010-05-31 22:20
invoking select.poll.poll() concurrently from multiple threads frequently yields garbage in one thread:

while poll_poll in thread 1 is parsing its result, another thread 2 calling poll may overwrite revents; assuming poll_result was 1 in thread 1 and thread 2 managed to clear all revents before thread 1 started scanning ufds, thread 1 would iterate straight through all ufds, past its bounds (no bound checks there), and return the first out-of-bounds entry that happens to have revents != 0

this issue needs at least documentation (although bounds-checking to prevent garbage in the result wouldn't hurt)

also, since there doesn't seem to be any locking w/ regards to ufds, it might be possible to corrupt python's heap, by concurrently invoking poll_register and poll_poll. poll_register could move the ufds array to another location while resizing it and poll_poll would subsequently overwrite memory that is not allocated anymore or allocated by someone else (did not test that)

python 2.5.5
msg106816 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-31 22:24
Do you have a script that reproduces it?
msg106827 - (view) Author: Christian Schubert (apexo) * Date: 2010-06-01 06:35
okay, I've managed to put together a script that fairly consistently reproduces the problem on my (core2 duo 2,something GHz) machine

some parameters (sleep times) are randomized in each iteration, the script runs for 30 iterations, each for 1 second, and counts the number of events where the fd looks bogus (max 1 per iteration); some other events (fd correct, revents bogus are also reported, but not counted)

example output:
poll returned [(-3, 65535)], we were looking for fd 3
poll returned [(-3, 65535)], we were looking for fd 3
poll returned [(-3, 65535)], we were looking for fd 3
poll returned [(-3, 65535)], we were looking for fd 3
poll returned [(-3, 65535)], we were looking for fd 3
poll returned [(-1, 65535)], we were looking for fd 3
poll returned [(-1, 65535)], we were looking for fd 3
poll returned [(-1627358347, 60497)], we were looking for fd 3
should not get here; poll returned [(3, 0)]
poll returned [(-3, 65535)], we were looking for fd 3
9 events in 30 iterations
msg106829 - (view) Author: Christian Schubert (apexo) * Date: 2010-06-01 07:09
the other issue (poll_register freeing data structures that poll_poll currently uses) can be reproduced by the attached script (at least I can)

depending on the parameters/circumstances it either segfaults or prints garbage results from poll, e.g. [(1684955474, 28192)]; the expected result would be [(3, 1)]
msg107863 - (view) Author: Christian Schubert (apexo) * Date: 2010-06-15 10:21
added a patch which fixes both issues

before releasing the GIL we take a copy of the ufds pointer and its len, erasing the ufds pointer in the poll object (to make sure nobody else fiddles with it); when we're done we either but it back into the object (when it's still NULL) or we free it

the update logic is modified as well, since the uptodate flag is not sufficient anymore, we now count up a tag for each modification of the poll dictionary and remember that tag together with the ufds pointer

the result is:
- for the single-threaded case we do little extra work (moving ufds from/to poll object)
- correct for the multi-threaded case, with slightly higher overhead (one additional call to each of malloc, update_ufd_array, free), probably worse than having one poll object per thread, but not worse than allocating a new poll object each time we want to poll

there still is potential for incorrect poll results (but not for memory corruption): when poll_register/poll_unregister are called exactly 2**32 times (or multiples thereof) while poll_poll is running, it will happily put back its outdated ufds pointer into the poll object when its done, this could be alleviated by changing tag to long long ... which is unlikely to wrap around anytime soon.
msg175387 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-11 19:48
I was unable to apply the patch automatically, so I had to do it manually.  Here is an updated patch for review.  I did not consider it in detail yet, but it seems to correct these errors.
msg175389 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-11-11 19:55
i'm looking at getting this in.
msg175390 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-11-11 21:21
Christian Schubert (apexo) - Would you please submit a PSF contributor agreement form?

http://www.python.org/psf/contrib/
http://www.python.org/psf/contrib/contrib-form-python/

thanks!
msg175394 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-11 21:50
In general the patch looks good to me.  I only get rid of non-needed macros.
msg175395 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-11-11 21:51
here's an updated patch.

it strikes me that this should not be a very common problem.  how many applications are going to share the same poll object _across_ multiple threads?

if they do and the file descriptor they'll be spending a lot of time mallocing and freeing new pollfd ufds arrays in each thread as a result as they all fight for ownership of the pollObject's ufds.
msg175397 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-11-11 21:54
our patches are similar.  i updated it to use long long and Py_ssize_t and Py_CLEAR and Py_RETURN_NONE in a few places and added comments.

getting rid of the CLEAR_UFDS macro as you did is a good idea.
msg175400 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-11 22:10
> it strikes me that this should not be a very common problem.  how many 
> applications are going to share the same poll object _across_ multiple 
> threads?

Indeed that doesn't sound very likely. How about raising an error on concurrent modification, instead of trying to make it thread-safe?
msg175409 - (view) Author: Christian Schubert (apexo) * Date: 2012-11-11 23:44
> How about raising an error on concurrent modification, instead of trying to make it thread-safe?

That's totally fine with me.
msg175429 - (view) Author: Christian Schubert (apexo) * Date: 2012-11-12 09:22
new proposed fix: forbid concurrent poll() invocation
msg175431 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-12 09:57
The patch LGTM.  I doubt about the exception type.  May be RuntimeError is more appropriate?
msg175432 - (view) Author: Christian Schubert (apexo) * Date: 2012-11-12 10:02
> I doubt about the exception type.  May be RuntimeError is more appropriate?

mea culpa, just copy&pasted without actually looking; fixed in v3
msg175501 - (view) Author: Christian Schubert (apexo) * Date: 2012-11-13 15:59
> Would you please submit a PSF contributor agreement form?

FYI: did that yesterday 9:43 UTC, no reaction (yet)
msg175525 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-13 20:45
> > Would you please submit a PSF contributor agreement form?
> 
> FYI: did that yesterday 9:43 UTC, no reaction (yet)

Don't worry, it can take some time to process. We can still apply your
patch, though, since you said you sent a contributor agreement.
msg176894 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-04 10:44
As I see, Christian's contributor agreement already confirmed. Let's go ahead.
msg176895 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-04 10:48
Christian, it would be nice if you write the tests.
msg176969 - (view) Author: Christian Schubert (apexo) * Date: 2012-12-05 07:52
What's a proper test for this? Testing, that the (now expected) exception occurs when invoking poll concurrently? Or rather, that the race condition does not occur? Last time I checked, pypy handled the concurrent poll invocation well, so it would fail both tests.
msg176998 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-05 19:19
Test should check changed behavior. I.e. it should test that calling poll concurrently raises an exception.
msg179657 - (view) Author: Charles-Fran├žois Natali (neologix) * (Python committer) Date: 2013-01-11 09:00
This patch should be updated to also fix epoll().

Also, is it right to raise an exception in case of concurrent invocation?

Here's a simple script that crashes systematically on my Linux box.
msg179659 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-11 09:19
> Also, is it right to raise an exception in case of concurrent invocation?

It is right for poll() because it was not concurrently usable in previous versions in any case. For epoll() it is an another issue.
msg180696 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-26 18:15
Here is a test made from Charles-Fran├žois's crasher. Let's go.
msg181822 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-10 17:37
Ping.
msg181823 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-10 17:44
Patches look good to me.
msg186736 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-04-13 15:59
Gregory?
msg194334 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-04 08:37
Gregory, could you commit the patch or allow me to do this?
msg195705 - (view) Author: Roundup Robot (python-dev) Date: 2013-08-20 17:59
New changeset 072ba5df77e4 by Serhiy Storchaka in branch '3.3':
Issue #8865: Concurrent invocation of select.poll.poll() now raises a
http://hg.python.org/cpython/rev/072ba5df77e4

New changeset 4543408e2ba6 by Serhiy Storchaka in branch 'default':
Issue #8865: Concurrent invocation of select.poll.poll() now raises a
http://hg.python.org/cpython/rev/4543408e2ba6

New changeset a4091c1de27a by Serhiy Storchaka in branch '2.7':
Issue #8865: Concurrent invocation of select.poll.poll() now raises a
http://hg.python.org/cpython/rev/a4091c1de27a
msg195707 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-20 18:23
Thank you for the report and the patch Christian.
History
Date User Action Args
2013-08-20 18:23:29serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg195707

stage: commit review -> resolved
2013-08-20 17:59:34python-devsetnosy: + python-dev
messages: + msg195705
2013-08-20 14:47:22serhiy.storchakasetassignee: gregory.p.smith -> serhiy.storchaka
versions: - Python 3.2
2013-08-04 17:25:37hayposetnosy: + haypo
2013-08-04 08:37:10serhiy.storchakasetmessages: + msg194334
2013-04-13 15:59:02serhiy.storchakasetmessages: + msg186736
2013-02-10 17:44:54pitrousetmessages: + msg181823
2013-02-10 17:37:40serhiy.storchakasetmessages: + msg181822
2013-01-26 18:17:05serhiy.storchakasetstage: test needed -> commit review
2013-01-26 18:15:19serhiy.storchakasetfiles: + issue8865_test.patch

messages: + msg180696
2013-01-11 09:19:16serhiy.storchakasetmessages: + msg179659
2013-01-11 09:00:32neologixsetfiles: + test.py
nosy: + neologix
messages: + msg179657

2013-01-11 08:58:36neologixunlinkissue16929 dependencies
2013-01-11 08:58:36neologixlinkissue16929 superseder
2013-01-11 08:52:14serhiy.storchakalinkissue16929 dependencies
2012-12-05 19:19:43serhiy.storchakasetmessages: + msg176998
2012-12-05 10:53:52asvetlovsetnosy: + asvetlov
2012-12-05 07:52:14apexosetmessages: + msg176969
2012-12-04 10:48:04serhiy.storchakasetmessages: + msg176895
2012-12-04 10:44:30serhiy.storchakasetmessages: + msg176894
2012-11-13 20:45:30pitrousetmessages: + msg175525
2012-11-13 15:59:15apexosetmessages: + msg175501
2012-11-12 10:02:00apexosetfiles: + issue8865_v3.diff

messages: + msg175432
2012-11-12 09:57:11serhiy.storchakasetmessages: + msg175431
2012-11-12 09:22:10apexosetfiles: + issue8865_v2.diff

messages: + msg175429
2012-11-11 23:44:34apexosetmessages: + msg175409
2012-11-11 22:10:08pitrousetmessages: + msg175400
2012-11-11 21:54:08gregory.p.smithsetmessages: + msg175397
2012-11-11 21:51:05gregory.p.smithsetfiles: + issue8865-poll-multithread-gps01.diff

messages: + msg175395
2012-11-11 21:50:31serhiy.storchakasetfiles: + select_concurrent_poll.patch

messages: + msg175394
stage: test needed
2012-11-11 21:21:22gregory.p.smithsetmessages: + msg175390
2012-11-11 19:55:29gregory.p.smithsetassignee: gregory.p.smith
messages: + msg175389
2012-11-11 19:48:10serhiy.storchakasetfiles: + select_concurrent_poll.patch
type: behavior -> crash
messages: + msg175387

versions: + Python 3.3, Python 3.4, - Python 2.6, Python 3.1
2012-11-11 16:52:34pitrousetnosy: + serhiy.storchaka
2012-11-11 04:08:53gregory.p.smithsetassignee: gregory.p.smith -> (no value)
2012-06-07 20:38:40gregory.p.smithsetassignee: gregory.p.smith

nosy: + gregory.p.smith
2010-06-15 10:31:07eric.araujosetnosy: - docs@python
2010-06-15 10:22:26pitrousetnosy: + exarkun
2010-06-15 10:21:28apexosetfiles: + select.patch
keywords: + patch
messages: + msg107863
2010-06-01 07:09:26apexosetfiles: + concurrent_poll_and_register.py

messages: + msg106829
2010-06-01 06:35:20apexosetfiles: + select_threads.py

messages: + msg106827
2010-05-31 22:24:08pitrousetversions: + Python 2.6, Python 3.1, Python 2.7, Python 3.2, - Python 2.5
nosy: + pitrou

messages: + msg106816

assignee: docs@python -> (no value)
components: - Documentation
2010-05-31 22:20:13apexocreate