Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Solaris "/dev/poll" in the "select" module #50646

Closed
jcea opened this issue Jul 1, 2009 · 22 comments
Closed

Implementing Solaris "/dev/poll" in the "select" module #50646

jcea opened this issue Jul 1, 2009 · 22 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@jcea
Copy link
Member

jcea commented Jul 1, 2009

BPO 6397
Nosy @jcea, @pitrou, @giampaolo
Files
  • 528fdd816160.diff
  • 518b32ce893e.diff
  • 2506e49b9f71.diff
  • 9d687fdd924d.diff
  • 11f08326afd0.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/jcea'
    closed_at = <Date 2011-11-14.18:18:54.823>
    created_at = <Date 2009-07-01.16:49:13.831>
    labels = ['type-feature', 'library']
    title = 'Implementing Solaris "/dev/poll" in the "select" module'
    updated_at = <Date 2011-11-14.18:18:54.823>
    user = 'https://github.com/jcea'

    bugs.python.org fields:

    activity = <Date 2011-11-14.18:18:54.823>
    actor = 'jcea'
    assignee = 'jcea'
    closed = True
    closed_date = <Date 2011-11-14.18:18:54.823>
    closer = 'jcea'
    components = ['Library (Lib)']
    creation = <Date 2009-07-01.16:49:13.831>
    creator = 'jcea'
    dependencies = []
    files = ['23626', '23636', '23643', '23646', '23684']
    hgrepos = ['86']
    issue_num = 6397
    keywords = ['patch', 'needs review']
    message_count = 22.0
    messages = ['89989', '89991', '146041', '146445', '146457', '146459', '146461', '147226', '147316', '147350', '147354', '147355', '147358', '147359', '147365', '147366', '147367', '147368', '147369', '147377', '147589', '147626']
    nosy_count = 7.0
    nosy_names = ['jcea', 'exarkun', 'pitrou', 'giampaolo.rodola', 'neologix', 'rosslagerwall', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue6397'
    versions = ['Python 3.3']

    @jcea
    Copy link
    Member Author

    jcea commented Jul 1, 2009

    In Python 2.6 we added support for Linux "epoll" and *BSD "kqueue" in
    the select module. I think we should add support for Solaris "poll"
    interface too.

    What do you think?.

    I volunteer to do the work, if you agree this is a feature we want to
    have. I think so.

    @jcea jcea self-assigned this Jul 1, 2009
    @jcea jcea added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jul 1, 2009
    @exarkun
    Copy link
    Mannequin

    exarkun mannequin commented Jul 1, 2009

    Solaris 10 introduced "The Event Completion Framework". I am not
    particularly familiar with Solaris, so I couldn't say whether it would
    be better to target this or the older /dev/poll. Some documentation
    suggests that "The Event Completion Framework" is somewhat preferred:

    http://developers.sun.com/solaris/articles/event_completion.html

    It suggests that /dev/poll is not as performant, but I'm not sure I
    believe it. One feature which it does seem to have which puts it ahead
    of /dev/poll is the support of various non-fd event sources (eg, timers).

    @jcea
    Copy link
    Member Author

    jcea commented Oct 20, 2011

    I want to move this forward.

    Apparently, "/dev/poll" could be actually used transparently in python "select.poll()" implementation. The semantics seems to be the same, so we could use the "poll" syscall or "/dev/poll" statically at compiling time, or dinamically at poll object creation time (try to open /dev/poll and go to "poll" syscall if that fails).

    Some details:

    http://developers.sun.com/solaris/articles/using_devpoll.html
    http://developers.sun.com/solaris/articles/polling_efficient.html

    I agree that Solaris 10 event framework would be nice to support too, but that would be another feature request.

    @jcea
    Copy link
    Member Author

    jcea commented Oct 26, 2011

    First version of the patch. Review 0ee4386d8f51.diff http://bugs.python.org/file23526/0ee4386d8f51.diff

    Details:

    1. Current code aliases "devpoll" in platforms with "/dev/poll" (Solaris and derivatives). Considering all the other points, I think that providing a separate "select.devpoll" object would be a better idea. Opinions?.

    2. I am not providing the exception contract documented in "select.poll". For instance, "poll.unregister" of a "fd" not previously registered doesn't raise an exception. Idem "poll.modify", etc. I could do it, but is pointless extra code.

    3. I have added a boolean "select.isdevpoll" to indicate if "selec.poll" uses "poll()" or "/dev/poll". Ugly.

    4. I release the GIL when waiting for the fds, but not when registering/unregistering fds, etc. I guess that the syscall would be very fast, but I haven't actually measure it.

    5. The internal REMOVE is because if you "register" several times the same fd, the events are ORed. To avoid that I do an internal REMOVE first. I should scan the pollfds internally and update the events inplace. Or provide a separate "devpoll" and document this fact...

    6. If the number of active fds is bigger that SIZE_DEVPOLL, only SIZE_DEVPOLL fds are returned. If you "poll" several times, you get the SAME fds, if they are still active. So, others active fds can suffer of starvation. Solutions: selftunning (if the module provide 20 slots and the OS provide 20 active fds, call again with 40 free slots, etc) or provide a separate "devpoll" and document this fact, posibly providing a "maxsize" parameter. "select.epoll" uses FD_SETSIZE, directly.

    With this, I am starting to think that providing a separate "select.devpoll" is actually a better idea.

    Opinions?.

    @jcea
    Copy link
    Member Author

    jcea commented Oct 26, 2011

    I have decided to segregate "select.devpoll" to a separate object, like "select.epoll".

    @jcea
    Copy link
    Member Author

    jcea commented Oct 26, 2011

    Solved points 1, 3 and 4.

    2 will be solved with the documentation.

    5 and 6 still pending.

    @jcea
    Copy link
    Member Author

    jcea commented Oct 26, 2011

    Documentation added. That solves 2 and 5.

    I still have to solve 6.

    @jcea jcea changed the title Implementing Solaris "poll" in the "select" module Implementing Solaris "/dev/poll" in the "select" module Oct 26, 2011
    @jcea
    Copy link
    Member Author

    jcea commented Nov 7, 2011

    Please, review.

    With current code, each devpoll object has capacity for managing 256 fds, by default. This is about 2048 bytes. The cost seems reasonable, since a normal program will have only a few devpoll objects around. I have considered an optional parameter to tune this, but interaction with rlimit is messy. Even if we manage 65536 fds, the memory cost is about 512Kbytes per devpoll, and you surely can affort it if you are actually managing 65536 descriptors...

    The code is not threadsafe. It doesn't crash, but concurrent use of a devpoll has undefined results.

    Please, review for integration.

    @jcea
    Copy link
    Member Author

    jcea commented Nov 8, 2011

    Please, check the new changeset, with all your feedback. Thanks!.

    @jcea
    Copy link
    Member Author

    jcea commented Nov 9, 2011

    Another changeset. Hopefully, final :-).

    Please, review.

    @rosslagerwall
    Copy link
    Mannequin

    rosslagerwall mannequin commented Nov 9, 2011

    + increases this value, c:func:`devpoll` will return a possible
    + incomplete list of active file descriptors.

    I think this should change to:

    + increases this value, c:func:`devpoll` will return a possibly
    + incomplete list of active file descriptors.

    or even better:

    + increases this value, c:func:`devpoll` may return an
    + incomplete list of active file descriptors.

    Cheers

    @jcea
    Copy link
    Member Author

    jcea commented Nov 9, 2011

    Thanks, Ross. Your suggestion has been committed in my branch.

    Waiting for more feedback.

    @rosslagerwall
    Copy link
    Mannequin

    rosslagerwall mannequin commented Nov 9, 2011

    Is write()ing a devpoll fd a blocking operation in the kernel?
    Does it need to have Py_BEGIN_ALLOW_THREADS around it?
    The same question applies for open()ing it.

    Obviously, the ioctl() call *is* blocking :-)

    @rosslagerwall
    Copy link
    Mannequin

    rosslagerwall mannequin commented Nov 9, 2011

    Also, you can use Py_RETURN_NONE instead of:
    + Py_INCREF(Py_None);
    + return Py_None;

    @jcea
    Copy link
    Member Author

    jcea commented Nov 9, 2011

    They are operations potentially slow, there are not realtime specifications.

    My machine is quite capable (16 CPUs), but let's see what a bit of DTRACE scripting tells us:

    First, let's time the open:

    """
    syscall::open*:entry
    /copyinstr(arg0)=="/dev/poll"/
    {
    self->ts = timestamp;
    }

    syscall::open*:return
    /self->ts/
    {
    @stats = quantize(timestamp-self->ts);
    self->ts = 0;
    }
    """

    The result, times in nanoseconds:

    """
    value ------------- Distribution ------------- count
    2048 | 0
    4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4497743
    8192 | 17358
    16384 | 1592
    32768 | 2046
    65536 | 812
    131072 | 374
    262144 | 0
    """

    Most "open"s finish in 4 microseconds, but we have a handful of "slow" openings of hundred of microseconds.

    Anyway, argueing with the GIL here is a nonissue, since the "open" is only done at object creation, and a sane program will only create a handful of devpoll objets.

    Let's see now the write:

    """
    syscall::open*:entry
    /copyinstr(arg0)=="/dev/poll"/
    {
    self->ts = timestamp;
    }

    syscall::open*:return
    /self->ts/
    {
    @stats = quantize(timestamp-self->ts);
    self->ts = 0;
    }

    """

    The results for a single descriptor registered:

    """
    value ------------- Distribution ------------- count
    256 | 0
    512 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4102701
    1024 | 28514
    2048 | 11269
    4096 | 537
    8192 | 245
    16384 | 377
    32768 | 193
    65536 | 134
    131072 | 71
    262144 | 0
    """

    Most writes are really fast, half a microsecond, but we have sporadic latencies of a hundred of microseconds.

    Re-registering 200 sockets per loop, I have:

    """
    value ------------- Distribution ------------- count
    512 | 0
    1024 | 50
    2048 | 94
    4096 | 157
    8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1112314
    16384 |@ 22977
    32768 | 1584
    65536 | 842
    131072 | 168
    262144 | 0
    """

    Most writes takes around 8 microseconds.

    So now the problem is to estimate how much time I need to release & readquire the GIL. For that, and while we don't have DTRACE integration in python (my next project, I hope), I change sourcecode to add a manual probe using the Solaris High Resolution timers, I get a median time of around 550 nanoseconds (half a microsecond), with a few spikes, when the GIL is not contested.

    So, unlocking the GIL is adding around half microsecond to the syscalls. This is fast, but comparable with the syscall actual duration. Liberating the GIL is doubling the time to register&unregister a socket (supposely we are doing a single socket per iteration, something realistic in real code).

    Being realistic, any work we do with the descriptors (like actually reading or writing it) is going to make this 0.5 microsecond gain pointless. Freeing the GIL, we are protected of any kernel contention, and playing for the rules :-).

    Anyway I am open to feedback.

    PS: I also checked with GIL contention, and results are comparable. That is surprising, maybe related to the GIL improvements in 3.3. I haven't investigated the issue further.

    @jcea
    Copy link
    Member Author

    jcea commented Nov 9, 2011

    New changeset, after Ross feedback. Thanks!.

    @rosslagerwall
    Copy link
    Mannequin

    rosslagerwall mannequin commented Nov 9, 2011

    That was thorough :-) Seems OK though.

    + if (n < size) {
    + PyErr_SetString(PyExc_IOError, "failed to write all pollfds. "
    + "Please, report in http://bugs.python.org/");

    If n < size, it's not a Python error is it? I would say it's the OS's fault.

    Otherwise, it looks good... although I don't have currently access to a Solaris box to test it.

    @jcea
    Copy link
    Member Author

    jcea commented Nov 9, 2011

    The timing for the GIL I am providing is for release&acquiring. That is, all the work. In fact I am having in count too the syscall inside the release&acquire, to account for cache issues.

    That is, between the release and the acquire, there is a real syscall. Of course, I am not timing it.

    So, the timing I am providing is accurate.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 9, 2011

    That was thorough :-) Seems OK though.

    • if (n < size) {
    •    PyErr_SetString(PyExc_IOError, "failed to write all pollfds. "
      
    •            "Please, report in http://bugs.python.org/");
      

    If n < size, it's not a Python error is it? I would say it's the OS's fault.

    No, but it's a Python error if Python wrongly assumes that write() won't
    return a partial result. Ideally write() should be retried in a loop
    until everything is written out.

    @jcea
    Copy link
    Member Author

    jcea commented Nov 9, 2011

    The problem with partial writes is that the data is not an unstructured stream of bytes, but a concrete binary dump. You can not simply retry again.

    My bet is that "/dev/poll" neves does partial writes.

    If I am mistaken, I will bug the Illumos people to help me to solve it. So far, this seems a theorical problema.

    Just in case it is not, we raise an exception and urge the programmer to report us the issue.

    The other option would be to try to be clever and do things like retrying if the partial write is a multiple of the struct size, but what if it not so?.

    Instead of guessing and play games, I rather prefer to know if this is actually a problem in the wild. In my tests, with 65530 sockets, I never saw this "impossible" exception.

    @jcea
    Copy link
    Member Author

    jcea commented Nov 14, 2011

    New changeset. The only change is:

    """
    diff --git a/Modules/selectmodule.c b/Modules/selectmodule.c
    --- a/Modules/selectmodule.c
    +++ b/Modules/selectmodule.c
    @@ -685,8 +685,16 @@
             return -1;
         }
         if (n < size) {
    -        PyErr_SetString(PyExc_IOError, "failed to write all pollfds. "
    -                "Please, report in http://bugs.python.org/");
    +        /*
    +        ** Data writed to /dev/poll is a binary data structure. It is not
    +        ** clear what to do if a partial write occurred. For now, raise
    +        ** an exception and see if we actually found this problem in
    +        ** the wild.
    +        */
    +        PyErr_Format(PyExc_IOError, "failed to write all pollfds. "
    +                "Please, report at http://bugs.python.org/. "
    +                "Data to report: Size tried: %d, actual size written: %d",
    +                size, n);
             return -1;
         }
         return 0;
    """

    If there are no disagreements, I will commit to default (3.3) soon (in a few hours/a day).

    Thanks!.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 14, 2011

    New changeset 8f7ab4bf7ad9 by Jesus Cea in branch 'default':
    Issue bpo-6397: Support '/dev/poll' polling objects in select module, under Solaris & derivatives.
    http://hg.python.org/cpython/rev/8f7ab4bf7ad9

    @jcea jcea closed this as completed Nov 14, 2011
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants