Issue 30931: Race condition in asyncore may access the wrong dispatcher

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/75114

classification

Title:	Race condition in asyncore may access the wrong dispatcher
Type:	resource usage	Stage:	resolved
Components:		Versions:	Python 3.7, Python 3.6, Python 3.5, Python 2.7

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:	Close asyncore/asynchat/smtpd issues and list them here View: 45552
Assigned To:	giampaolo.rodola	Nosy List:	Nir Soffer, giampaolo.rodola, vstinner, walkhour
Priority:	normal	Keywords:

Created on 2017-07-14 16:44 by walkhour, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 2707	closed	walkhour, 2017-07-14 16:48
PR 2764	closed	walkhour, 2017-07-19 15:54
PR 2854	closed	vstinner, 2017-07-24 22:29

Messages (15)
msg298359 - (view)	Author: Jaume (walkhour) *	Date: 2017-07-14 16:44
If a socket is closed and polled here https://github.com/python/cpython/blob/0d0a32fb91cdfea1626e6c6b77a9bc44e15a2b8a/Lib/asyncore.py#L183 the flag returned will be select.POLLNVAL (note that this can happen despite dispatcher.close() being called). This wouldn't be a problem but before here https://github.com/python/cpython/blob/0d0a32fb91cdfea1626e6c6b77a9bc44e15a2b8a/Lib/asyncore.py#L185 a new socket may be created with the same fd (which wouldn't be strange since that fd is now available), so the retrieved socket will be the newly created one instead of the old one and the new one will be close. This is regularly happening to us, I could try to explain better, but that's really it.
msg298362 - (view)	Author: Nir Soffer (Nir Soffer)	Date: 2017-07-14 17:17
Can you provide a minimal reproducer, or best add a failing test?
msg298682 - (view)	Author: Nir Soffer (Nir Soffer)	Date: 2017-07-19 16:21
Adding more info after discussion in github. After polling readable/writeable dispatchers, asyncore.poll(2) receive a list of ready file descriptors, and invoke callbacks on the dispatcher objects. If a dispatchers is closed and and a new dispatcher is created, the new dispatcher may get the same file descriptor. If the file descriptor was in the ready list returned from poll()/select(), asyncore may try to invoke one of the callbacks (e.g. handle_write) on the new dispatcher. Here is an example in asycore.poll() r, w, e = select.select(r, w, e, timeout) for fd in r: obj = map.get(fd) if obj is None: continue read(obj) read close obj, removing fd from map, then creates a new one getting the same fd... for fd in w: obj = map.get(fd) this get the new object from the map, instead of the closed one. if obj is None: continue write(obj) invoke write on the wrong socket, which is not writable for fd in e: obj = map.get(fd) same issue here if obj is None: continue _exception(obj) asyncore.poll2() has same issue: r = pollster.poll(timeout) for fd, flags in r: obj = map.get(fd) if obj is None: continue readwrite(obj, flags) fd may have been closed and recreated by in a previous iteration of the loop. This issue is demonstrated in the failing test: https://github.com/python/cpython/pull/2707/commits/5aeb0098d2347984f3a89cf036c305edd2b8382b
msg299030 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-07-24 23:14
On my PR 2854, Nir added these comments (extract): "And now we try to read from close dispatcher. I think we should wait for #2804 (...)" Sorry, I don't understand. My PR fixes described the bug that you described in msg298682: "If a dispatchers is closed and and a new dispatcher is created, the new dispatcher may get the same file descriptor. If the file descriptor was in the ready list returned from poll()/select(), asyncore may try to invoke one of the callbacks (e.g. handle_write) on the new dispatcher." About reading from closed dispatcher: sorry, I don't know the asyncore concurrency model, but I know the asyncio concurrency model. In asyncio, I don't see why a "dispatcher" would affect any "unrelated" dispatcher. Said differently: if a dispatcher A closes the dispatcher B and dispatcher B is "readable", the read event will be handled anyway. The "close event" will be handled at the next loop iteration. The oversimplified asyncio model is "each iteration is atomic". Closing a dispatcher calls its del_channel() which removes its file descriptor from the asyncore map. -- If I understand correctly, the current behaviour depends on the file descriptor order, since select.select() and select.poll.poll() returns ordered file descriptors (smallest to largest). If a dispatcher A is called and closes a dispatcher B, the dispatcher B is called or not depending on the file descriptor. If A.fileno() < B.fileno(), B is not called. If A.fileno() > B.fileno(), B was already called. Am I right? What is the use case of a dispatcher closing another one? What is the consequence on my PR? Does the dispatcher B expects to not be called to read if it was already closed? I see that close() also immediately closes the underlying socket. Does it mean that my PR can triggered OSError since we try recv() on a closed socket?
msg299132 - (view)	Author: Nir Soffer (Nir Soffer)	Date: 2017-07-25 18:58
On my PR 2854, Nir added these comments (extract): > "And now we try to read from close dispatcher. I think we should wait > for #2804 (...)" > Sorry, I don't understand. My PR fixes described the bug that you > described in msg298682: > "If a dispatchers is closed and and a new dispatcher is created, the > new dispatcher may get the same file descriptor. If the file > descriptor was in the ready list returned from poll()/select(), > asyncore may try to invoke one of the callbacks (e.g. handle_write) > on the new dispatcher." > About reading from closed dispatcher: sorry, I don't know the asyncore > concurrency model, but I know the asyncio concurrency model. In > asyncio, I don't see why a "dispatcher" would affect any "unrelated" > dispatcher. Said differently: if a dispatcher A closes the dispatcher > B and dispatcher B is "readable", the read event will be handled > anyway. The "close event" will be handled at the next loop iteration. > The oversimplified asyncio model is "each iteration is atomic". I don't know asyncio enough, but I suspect this issue may effect it if it is using the fd to check if a reader/writer is closed. This is the main issue in asyncore, assuming that during the loop: obj = map.get(fd) Means that a obj is the dispatcher that owned fd when preparing for poll()/select(), and we can use obj for calling the I/O callbacks. The dispatcher owning this fd may have been closed during the loop, and a new dispatcher may have been created with the same fd. The only way to check if a dispatcher is closed is to check the object. > Closing a dispatcher calls its del_channel() which removes its file > descriptor from the asyncore map. And creating new one at that point will add a new channel using the same fd. > If I understand correctly, the current behaviour depends on the file > descriptor order, since select.select() and select.poll.poll() returns > ordered file descriptors (smallest to largest). If a dispatcher A is > called and closes a dispatcher B, the dispatcher B is called or not > depending on the file descriptor. If A.fileno() < B.fileno(), B is not > called. If A.fileno() > B.fileno(), B was already called. Am I right? I don't think the order matter, only the fact that closing a dispatcher and creating a new one is likely to use the same fd. > What is the use case of a dispatcher closing another one? What is the > consequence on my PR? Does the dispatcher B expects to not be called > to read if it was already closed? I don't know what is the use case of the reporter, but here is one possible example - you implement a proxy, each proxy connection has 2 legs, each using a dispatcher connected to an endpoint. If one leg is closed, you close also the other leg (one dispatcher closing another). Now if you create a new dispatcher during the same loop iteration, it may get the same fd of the other closed leg, and asyncore may try to read/write with this dispatcher. This may work or not depending on how you implement the dispatcher. This is domonstrated in https://github.com/python/cpython/pull/2764/commits/5aeb0098d2347984f3a89cf036c305edd2b8382b > I see that close() also immediately closes the underlying socket. Does > it mean that my PR can triggered OSError since we try recv() on a > closed socket? Yes, this is a typical issue in asyncore if you don't protect your subclass to handle double close. asyncore.file_wrapper is protected, but asyncore.dispatcher and its subclasses are not.
msg299144 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2017-07-25 21:21
> Yes, this is a typical issue in asyncore if you don't protect your > subclass to handle double close. I use the same trick all over the place in pyftpdlib: https://github.com/giampaolo/pyftpdlib/blob/1268bb185cd63c657d78bc33309041628e62360a/pyftpdlib/handlers.py#L537 In practical terms, does this bug report aim to fix this issue?
msg299150 - (view)	Author: Nir Soffer (Nir Soffer)	Date: 2017-07-25 21:52
> I use the same trick all over the place in pyftpdlib: > https://github.com/giampaolo/pyftpdlib/blob/1268bb185cd63c657d78bc33309041628e62360a/pyftpdlib/handlers.py#L537 This allow detection of closed sockets, but does not fix the issue of accessing the wrong dispatcher. > In practical terms, does this bug report aim to fix this issue? Yes, see the attached PR's.
msg299156 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-07-25 22:42
I'm not sure that it's a good idea to compare asyncore and asyncio. While their name are similar, their design are very different. I'm only talking about the kernel, the core event loop checking for file descriptors. In asyncio, when you close a transport, the transport is not closed immediately. It is scheduled to be closed as soon as possible: usually in the next loop iteration, but it can longer to complex transports like subprocesses or TLS connections. Thanks to this design, asyncio doesn't have the race condition described in this issue.
msg299158 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-07-25 22:51
If I understood correctly, to fix this race condition, there are two main constraints: (A) We must call handlers of dispatcher registered before calling select()/poll(): dispatchers can be closed and their file descriptor can be reused while we execute other dispatchers (B) Don't call closed dispatcher -- For (A), I only see one option: copy somehow the map, for example using: ready = [(map[fd], flags) for fd, flags in r] -- For (B), it's more tricky. The bpo-30985 proposes to start using the closing: I'm not sure that it's doable in a stable version like 3.6, or worse in 2.7, since existing code may use closing differently. http://bugs.python.org/issue30985#msg299154 I'm also concerned about subclassing: if a subclass overrides completely close(), we would fail to detect that the dispatcher was closed. Maybe this corner doesn't occur in the wild? I see a different approach: while iterating on "ready" objects, check again if map.get(fd) still is the expected dispatcher. If the file descriptor is no more registered in map: it was closed. If we get a new dispatcher: not only the previous dispatcher was closed, but a new dispatcher also got the same file descriptor and was registered in the map. Do you think that this approach would work? Would it be safer in term of backward compatibility?
msg299162 - (view)	Author: Jaume (walkhour) *	Date: 2017-07-25 23:08
There's an alternative fix which follows a similar approach to the one you mention: https://github.com/python/cpython/pull/2707/. I personally don't like too much to reuse again again an old variable as it's true that we don't know how people are using it (although I take they use it as a boolean since it's assigned to False). On the other side this approach copies the map when it's not strictly necessary to do use just because we can't do it the proper way by using closing, as shown in the PR using it: https://github.com/python/cpython/pull/2764/files
msg299164 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-07-25 23:21
> There's an alternative fix which follows a similar approach to the one you mention: https://github.com/python/cpython/pull/2707/. Sorry, I'm slow to understand. Now that the bug and expected behaviour is better explained, I can now review correctly your PR and I like it :-) While you wrote you previous comment, I updated my PR to implement a similar idea: https://github.com/python/cpython/pull/2854 My code now detects if a dispatcher was closed, but also if it's file descriptor was reused by a newly registered dispatcher. Our PR are similar: * create a copy before starting to run handler * check if our copy is still consistent to the current map: if not, skip the dispatcher handler My PR is different: * It only creates a list of ready objects: it might be cheaper if only few file descriptors are ready, whereas the map is large... I don't think that it's a real performance bottleneck in practice, but well, I'm just trying to compare our implementations :-) ... To be honest, my PR is a copy of your other PR, not you wrote both versions in fact ;-) (that's why I credited you as co-author in my PR) * My PR has a simpler and more specific unit test: only testing the poll() function, no server, no client: just handlers. * My PR also tests when a dispatcher was only closed, not replaced. -- Good! It seems like slowly we converge to a solution, no? Now, it's more bikeshedding on the exact implementation ;-)
msg299300 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-07-27 10:11
Ok, wow, the discussion on this issue, bpo-30985, PR 2854, PR 2804, etc. was very productive. We identified much more race conditions than the first one that I spotted. (Bug 1) The dispatcher A closes the dispatcher B. Currently, asyncore calls the handlers of the dispatcher B. Technically, the close() method of a dispatcher closes immediately a socket object. So it's very likely that handlers of the dispatcher B fails on recv(), send(), or any other operation on the socket. There is also a risk that somehow dispatcher B tries to closes again the socket (technically, socket, pipe or something else) and gets a OSError(EBADF) or worse: closes an unrelated socket! (Bug 2) Similar to (Bug 1), but the subtle coming from the current implementation causes even worse bugs. The dispatcher A closes the dispatcher B, and then the dispatcher A (or another dispatcher, it doesn't matter who make that) creates a new dispatcher C which reuses the same file descriptor than the old closed file descriptor of dispatcher B. The consequence is that handlers of dispatcher C are called even if the socket is not ready to read or to write. Since sockets are non-blocking, recv() and send() are likely to fail with BlockingIOError. These errors are usually correctly handled in asynchronous code, but they can cause bugs. More generally, calling handlers when the socket is not ready can cause many kinds of bugs. (Bug 3) If a socket is ready to read and ready to write, handle_read() and handle_write() handlers are expected to be called. The problem is when handle_read() (first called handler) closes the dispatcher. It was decided to not call handle_write() is that case. (Bug 4) Ok, this one is the best one :-) When a socket is ready, asyncore calls methods like handle_write_event(). These methods usually call a single handler like handle_write(), but sometimes can call multiple handlers like handle_connect() followed by handle_write(). If a first handler closes the dispatcher, the following handler may fail badly on trying to use a closed socket. IMHO (Bug 3) and (Bug 4) are less surprising than (Bug 1) or (Bug 2) since they are determistic and so less "race condition". Since these bugs are determistic, I expect that applications are already written to handle these corner cases, somehow. The simplest option is to catch and ignore errors, asyncore already ignores many errors for us. For example asyncore already ignores EBADF, whereas IMHO it hides a design issue (the 4 bugs I just listed). Technically, fixing (Bug 3) using my PR 2854 design (check if map[fd] is still the same dispatcher) is safe since it doesn't rely at all on the actual implementation of asyncore handle_xxx_event() (which can be overriden) not handle_xxx() methods defined by third party code. Fixing (Bug 4) is more complex since it requires to decide how to detect that a dispatcher was closed (it seems like "dispatcher._fileno is None" test is the best choice), and to modify handle_xxx_event() methods, whereas these methods can and are overridden in subclasses. -- About backward compatibility, in short, fixing any of these bug changes the exact behaviour, since fixing all these bugs require to not call a handler if the dispatcher was closed. If a specific application requires that handlers are called anyway, changes are likely to break these applications. A workaround for these applications is to copy asyncore.py from an old Python version to keep the exact same behaviour. Another smoother workaround is to monkey-patch poll() and poll2(), and maybe also override handle_xxx_event() methods, to restore the old behaviour. I don't want to add a flag to re-enable the old behaviour, since it seems like we all agree that the described bugs are clearly very bad design bugs and that they must be fixed. -- Another option, Nir will probably not like it, is to consider that asyncore is dead and will not be fixed. That's not my favorite option. Nir explained that on Python 2.7, asyncore is the most convenient module for his use case, he deeply rely on it, and so "just want" to fix know bugs. -- I suggest to use this bpo to fix (Bug 1), (Bug 2) and (Bug 3), since they can all be fixed in poll() and poll2() with the same design, like my PR 2854. For (Bug 4), I suggest to use bpo-30985 where we already discussed many options how to detect that a dispatcher was closed. I have no strong opinion on this bug, if it should be fixed or not. IMHO it's less important and applications already know how handle the bug.
msg299301 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-07-27 10:14
Much shorter summary: * Handlers of dispatcher must not be called if the dispatcher was called. * If a dispatcher is closed between the execution of two of his handlers, the next handlers should not be called My PR 2854 adds 3 unit tests to verify this behaviour of 3 different "race conditions".
msg299350 - (view)	Author: Nir Soffer (Nir Soffer)	Date: 2017-07-27 20:47
Victor, I mostly agree with you, but I think we have here several bugs, and we should do the minimal fix for each of them. Your PR is too big, trying to fix too much. (Bug 1) The dispatcher A closes the dispatcher B. Currently, asyncore calls the handlers of the dispatcher B. We don't have such bug now - if dispatcher B is closed, it is not in the map, and current code will skip the fd when checking: for fd in r: obj = map.get(fd) if obj is None: continue (Bug 2) Dispatcher A close Dispatcher B, create Dispatcher C. C get may use the same fd as B. If B was ready, asyncore may get C from the map and access it instead of B. This issue is reported in this bug. (Bug 3) if handle_read() closed the dispatcher, asyncore will call handle_write() on a closed dispatcher. This is a very old bug with asyncore.readwrite(). (Bug 4) handle_xxx_event() internal methods may call multiple handle_xxx() user methods, again not checking if dispatcher was closed after each invocation. Same, very old bug. So I suggest that we fix only bug 2 in https://github.com/python/cpython/pull/2854. The issue in readwrite() can be solved with the approach you suggest, but I prefer to make small an careful steps so we don't introduce regressions in stable versions. Also, there is already too much noise here an in the various PRs about this issue, better file bugs for the rest of the issues and discuss them separately. This commit https://github.com/python/cpython/pull/2854/commits/bbd2d09ab999fa2214cbbd2589ae3642facd3057 Looks fine with the test_poll_close_replace_dispatchers test added in the later commit.
msg299461 - (view)	Author: Giampaolo Rodola' (giampaolo.rodola) *	Date: 2017-07-29 09:52
+1, I would push https://github.com/python/cpython/pull/2854/ first and fix the race condition only.

History
Date	User	Action	Args
2022-04-11 14:58:49	admin	set	github: 75114
2021-10-21 11:27:33	iritkatriel	set	status: open -> closed superseder: Close asyncore/asynchat/smtpd issues and list them here resolution: wont fix stage: resolved
2017-07-29 09:53:08	giampaolo.rodola	set	assignee: giampaolo.rodola
2017-07-29 09:52:24	giampaolo.rodola	set	messages: + msg299461
2017-07-27 20:47:50	Nir Soffer	set	messages: + msg299350
2017-07-27 10:14:20	vstinner	set	messages: + msg299301
2017-07-27 10:11:28	vstinner	set	messages: + msg299300
2017-07-25 23:21:08	vstinner	set	messages: + msg299164
2017-07-25 23:08:20	walkhour	set	messages: + msg299162
2017-07-25 22:51:31	vstinner	set	messages: + msg299158
2017-07-25 22:42:04	vstinner	set	messages: + msg299156
2017-07-25 21:52:05	Nir Soffer	set	messages: + msg299150
2017-07-25 21:21:02	giampaolo.rodola	set	nosy: + giampaolo.rodola messages: + msg299144
2017-07-25 18:58:07	Nir Soffer	set	messages: + msg299132
2017-07-24 23:14:22	vstinner	set	nosy: + vstinner messages: + msg299030
2017-07-24 22:29:18	vstinner	set	pull_requests: + pull_request2905
2017-07-21 00:07:26	vstinner	set	versions: - Python 3.3, Python 3.4
2017-07-19 16:21:53	Nir Soffer	set	messages: + msg298682 title: Race condition in asyncore wrongly closes channel -> Race condition in asyncore may access the wrong dispatcher
2017-07-19 15:54:35	walkhour	set	pull_requests: + pull_request2823
2017-07-14 17:17:04	Nir Soffer	set	nosy: + Nir Soffer messages: + msg298362
2017-07-14 16:48:25	walkhour	set	pull_requests: + pull_request2771
2017-07-14 16:44:47	walkhour	set	type: resource usage versions: + Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7
2017-07-14 16:44:08	walkhour	create