This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: select.epoll.poll may behave differently if timeout = -1 vs timeout = None
Type: behavior Stage: resolved
Components: Extension Modules Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Gabriel McManus, berker.peksag, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-01-29 06:28 by Gabriel McManus, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 9040 merged berker.peksag, 2018-09-03 09:13
Messages (6)
msg286431 - (view) Author: Gabriel McManus (Gabriel McManus) Date: 2017-01-29 06:28
The select module epoll.poll method takes a "timeout" parameter which is documented as having a default value of -1 [1]. If no timeout (or None) is passed to epoll.poll, then a value of -1 is passed to the epoll_wait system call. But if a timeout of -1 is passed to epoll.poll, then a value of -1000 is passed to epoll_wait. This is because the timeout is converted from seconds to milliseconds.

Before Python 3.5, if a negative timeout was passed to epoll.poll then -1 was passed to epoll_wait [2].

The Linux epoll_wait documentation doesn't specify the behaviour if timeout < -1. Linux itself behaves the same for all negative timeouts: never time out. But on Illumos, timeout < -1 currently times out immediately, and only timeout == -1 never times out [3].

Some code does pass -1 to select.epoll.poll expecting it to never time out. For example, selectors.EpollSelector [4].

I suggest restoring the pre-3.5 behaviour: epoll.poll should use -1 if the given timeout is negative.

I discovered this because ipython3 uses selectors.EpollSelector on Illumos,
and it was using 100% cpu while waiting for input because epoll_wait was returning immediately.

To demonstrate the issue you can run:

    strace python3.5 -c 'import select; select.epoll().poll(timeout=-1)' &

On Illumos this completes immediately, and the output contains the -1000 timeout:

    epoll_wait(3, [], 1023, -1000)          = 0

On Linux, it will block. If you then kill the python process with SIGTERM, strace should print the interrupted epoll_wait call, revealing the -1000 timeout:

    epoll_wait(3, 
    ...
    299a070, 1023, -100)        = -1 EINTR (Interrupted system call)

[1] https://github.com/python/cpython/blob/b9e40ed1bcce127893e40dd355087cda7187ac27/Modules/selectmodule.c#L1489
[2] https://github.com/python/cpython/commit/02e27d1301ea680dce9c3013010e3befedf9628a
[3] https://github.com/joyent/illumos-joyent/issues/136
[4] https://github.com/python/cpython/blob/8228a2b306844a213eddb4fb908c1925840ff67e/Lib/selectors.py#L428
msg286479 - (view) Author: Gabriel McManus (Gabriel McManus) Date: 2017-01-30 09:55
As mentioned in [1], Illumos will be fixed to match Linux's behaviour, so this problem will go away. It may still be worth changing epoll to just send -1 though, in case this causes similar issues in other operating systems.

[1] https://github.com/joyent/illumos-joyent/issues/136
msg323684 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2018-08-17 22:44
Thanks for the report.

Looking at the Kernel source code, there doesn't seem to be any difference between -1, -100, or -255: https://github.com/torvalds/linux/blob/9bd553929f68921be0f2014dd06561e0c8249a0d/fs/eventpoll.c#L1747-L1761

Do you know any other OS that implements or mimicks epoll() other than Illumos? Since https://github.com/joyent/illumos-joyent/commit/d21b3b2e1bbefbd2f6158ed5d329cd58f86677ab, Illumos follows Linux's behavior, so I'm not sure whether we should do something similar to https://github.com/python/cpython/commit/6cfa927ceb931ad968b5b03e4a2bffb64a8a0604 for epoll.poll().
msg324489 - (view) Author: Gabriel McManus (Gabriel McManus) Date: 2018-09-03 02:17
I don't know of any other OS that implements epoll, so this is issue is likely no longer a problem. Although it is strange to convert -1 to -1000 (as though from seconds to milliseconds), it may not be worth changing.
msg324493 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-03 05:26
See similar issue31334. It may be worth to make the code similar. Only -1 is documented as a special timeout value for epoll_wait().
msg325035 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2018-09-11 17:29
New changeset b690b9b04729ba3d91c59bff1bb11c3dcc1b50fc by Berker Peksag in branch 'master':
bpo-29386: Pass -1 to epoll_wait() when timeout is < -1 (GH-9040)
https://github.com/python/cpython/commit/b690b9b04729ba3d91c59bff1bb11c3dcc1b50fc
History
Date User Action Args
2022-04-11 14:58:42adminsetgithub: 73572
2018-09-11 17:30:58berker.peksagsetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: - Python 3.6, Python 3.7
2018-09-11 17:29:52berker.peksagsetmessages: + msg325035
2018-09-03 09:13:24berker.peksagsetkeywords: + patch
stage: patch review
pull_requests: + pull_request8502
2018-09-03 05:26:11serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg324493
2018-09-03 02:17:08Gabriel McManussetmessages: + msg324489
2018-08-17 22:44:10berker.peksagsetnosy: + berker.peksag

messages: + msg323684
versions: + Python 3.8, - Python 3.5
2017-01-30 09:55:36Gabriel McManussetmessages: + msg286479
2017-01-29 06:28:41Gabriel McManuscreate