This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: select.select() corner cases: duplicate fds, out-of-range fds
Type: behavior Stage:
Components: Extension Modules Versions: Python 3.2
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, cdleary, docs@python, exarkun, georg.brandl, tshepang
Priority: normal Keywords:

Created on 2010-01-11 06:52 by cdleary, last changed 2022-04-11 14:56 by admin.

Messages (2)
msg97578 - (view) Author: Chris Leary (cdleary) Date: 2010-01-11 06:52
I was just reading through this ACM article that enumerates some of the issues with the select function in .NET: http://cacm.acm.org/magazines/2009/5/24646-api-design-matters/fulltext

select.select() currently suffers from the same documentation problem where the behavior with duplicate and/or out-of-range file descriptors in one of the sequences (i.e. rlist) is not described.

Given the current implementation of seq2set in trunk it appears that:

1. A ValueError is raised when a given file descriptor is out of range. (Typically a result of the programmer passing a non-fd value, since FD_SETSIZE is "normally at least equal to the maximum number of descriptors supported by the system.")

2. Duplicate file descriptor numbers are collapsed into the fd_set, and are therefore idempotent at a system API level.

However, the language-level support code generally assumes no duplication, as there is a fixed size array of (FD_SETSIZE + 1) pylist entries (one additional for a sentinel value). Although there is a TODO to dynamically size that to the largest targeted file descriptor number, that would still assume one PyObject per file descriptor in the input sequences.

The set2list function used to produce a return value will, however, return duplicates: for each value in the input list, if the corresponding fd is set, that pyobject is added to the return list.


Proposed Changes
----------------

At a glance it would seem that the Right Thing to do is to collapse duplicates in the input, as if we created a set(AsFileDescriptor(o) for o in input_list), so that no duplicates will be returned in the result; however, you *can* have a heterogeneous input list with a fileno like 5 and a file-like object whose fileno() resolved to 5, in which case you don't want to arbitrarily choose only one of those PyObjects to return. Therefore, I'm thinking it's probably best to leave it as-is and document it.

In any case, if we want to explicitly allow duplicates in the input list we should probably make the pylist arrays into dynamically sized structures in the sizes of the corresponding input lists for correctness.

If this all makes sense I'll be happy to come up with a module/documentation/unit test patch.
msg109992 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-11 11:20
Chris, to me it's as clear as mud but please produce a doc patch anyway. :)
History
Date User Action Args
2022-04-11 14:56:56adminsetgithub: 51923
2018-09-27 19:45:42berker.peksagsetnosy: + berker.peksag
2014-02-03 17:04:27BreamoreBoysetnosy: - BreamoreBoy
2013-07-31 13:31:50tshepangsetnosy: + tshepang
2010-07-11 11:44:01pitrousetassignee: docs@python ->
versions: + Python 3.2
nosy: + exarkun
components: - Documentation
2010-07-11 11:20:23BreamoreBoysetassignee: georg.brandl -> docs@python

messages: + msg109992
nosy: + BreamoreBoy, docs@python
2010-01-11 06:52:29cdlearycreate