This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: multiprocessing.Queue uses select()
Type: behavior Stage:
Components: IO Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7, Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: Nosy List: William.Edwards, giampaolo.rodola, sbt
Priority: normal Keywords:

Created on 2012-10-17 21:28 by William.Edwards, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (5)
msg173208 - (view) Author: William Edwards (William.Edwards) Date: 2012-10-17 21:28
If you have 1024 file descriptors already open, the file descriptors created internally in multiprocessing.Queue will be beyond 1024 and the select() call buried deep in the Queue will throw an exception.

In fact, all uses of select() in the Python libs should be use poll() where available instead, obviously.
msg173215 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2012-10-17 22:34
On one hand this seems reasonable to me, on the other hand I'm not sure.
select() other than being supported on all platforms has the advantage of being simple and quick to use (you just call it once by passing a set of fds and then you're done).

poll() / epoll() aren't as simple as they require:

- e/poll() object initialization
- fds registration 
- fds unregistration
- e/poll() object destruction

Given the exact point where this is supposed to take place (here: http://hg.python.org/cpython/file/f6fcff683866/Lib/multiprocessing/connection.py#l865) I'm not sure it's really worth the effort as on one hand you fix a pretty rare scalability issue, on the other hand you introduce a considerable slowdown given the amount of operations involved and described above.
msg173216 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-10-17 23:37
> select() other than being supported on all platforms has the advantage of 
> being simple and quick to use (you just call it once by passing a set of fds 
> and then you're done).

Do you mean at the C level?  Wouldn't you just do

  struct pollfd pfd = {fd, POLLIN, 0};
  if (poll(&pfd, 1, timeout) < 0) {...}
  ready = pfd.revents != 0;

That does not look any less simple and quick.

> on the other hand you introduce a considerable slowdown given the amount 
> of operations involved and described above.

poll(), unlike select(), does not have to scan an fd_set (of 1024 bits?) so I would have expected it to be faster if anything.

At the python level creating a new poll object each time might indeed be slower, but one could always cache it on the queue object.

BTW, are there any non-Windows platforms which support multiprocessing but don't have poll()?  (On Windows WaitForSingleObject() is used instead.)
msg173218 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2012-10-18 00:03
> Do you mean at the C level?

No, Python of course.

> poll(), unlike select(), does not have to scan an fd_set 
> (of 1024 bits?) so I would have expected it to be faster if anything.

That might be true in a continuous loop (e.g. a reactor).
Judging from where this is supposed to take place (http://hg.python.org/cpython/file/f6fcff683866/Lib/multiprocessing/connection.py#l865) what you would end up doing within the wait() function is:

- init_pollster()
- register(fd) * num of fds
- unregister(fd) * num of fds
- close_pollster()

...and I suspect that's likely to be slower than just using select(), even if you cache the poll object. Anyway, I might be wrong, and figuring that out with a simple benchmark is easy.
Other than that I'm not sure how often wait() gets called usually so even if a slowdown is introduced that might not even be a problem.
msg173219 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2012-10-18 00:13
Speaking of which, it seems this is a duplicate of issue 10527.
History
Date User Action Args
2022-04-11 14:57:37adminsetgithub: 60473
2012-10-18 00:29:25giampaolo.rodolasetstatus: open -> closed
resolution: duplicate
2012-10-18 00:13:13giampaolo.rodolasetmessages: + msg173219
2012-10-18 00:03:50giampaolo.rodolasetmessages: + msg173218
2012-10-17 23:37:59sbtsetmessages: + msg173216
2012-10-17 22:34:49giampaolo.rodolasetmessages: + msg173215
2012-10-17 21:56:20giampaolo.rodolasetnosy: + giampaolo.rodola
2012-10-17 21:49:17pitrousetnosy: + sbt

versions: + Python 3.2, Python 3.3, Python 3.4
2012-10-17 21:28:38William.Edwardscreate