classification
Title: race condition in initstdio() (python aborts running under nohup)
Type: crash Stage:
Components: Interpreter Core Versions: Python 3.6, Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Yi Ding, mpaolini, ned.deily, python-dev, rbcollins, ronaldoussoren, vstinner
Priority: low Keywords: patch

Created on 2015-08-18 22:18 by Yi Ding, last changed 2015-09-04 20:23 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
test.py Yi Ding, 2015-08-18 22:18
test.sh Yi Ding, 2015-08-18 22:19
issue24891.patch mpaolini, 2015-09-04 11:08 review
issue24891_2.patch mpaolini, 2015-09-04 13:39 review
issue24891_3.patch vstinner, 2015-09-04 14:24 review
Messages (19)
msg248797 - (view) Author: Yi Ding (Yi Ding) Date: 2015-08-18 22:18
Looks like this bug https://bugs.python.org/issue7111 has resurfaced in python3 (python 2.6 works as far as I can tell) at least on Macs.

I've attached a simple test script.

Steps:
1. SSH to remote server.
2. Run nohup ./test.sh &
3. exit SSH.
4. SSH back in and see that there are a bunch of errors in your nohup.out file.
msg248798 - (view) Author: Yi Ding (Yi Ding) Date: 2015-08-18 22:19
test.sh attached
msg248800 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2015-08-18 22:53
What sort of errors?
msg248801 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-08-18 23:00
I'm unable to reproduce the issue on Linux with Python 3.4.2.
msg248803 - (view) Author: Yi Ding (Yi Ding) Date: 2015-08-18 23:24
Fatal Python error: Py_Initialize: can't initialize sys standard streams
OSError: [Errno 9] Bad file descriptor
./test.sh: line 4: 49632 Abort trap: 6           python3 test.py -o hello.txt
msg248804 - (view) Author: Yi Ding (Yi Ding) Date: 2015-08-18 23:26
That's from my nohup.out. It might be a Mac OS specific thing.
msg248805 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-08-18 23:41
I can reproduce the bug with gdb if the file descriptor 0 is closed before calling:
    std = create_stdio(iomod, fd, 0, "<stdin>", encoding, errors);
and after the following lines were called:
    fd = fileno(stdin);
    if (!is_valid_fd(fd)) {

In initstdio () at Python/pylifecycle.c:1156.

create_stdio() fails in FileIO constructor, on the line:
   if (_Py_fstat(self->fd, &fdfstat) < 0)
of _io_FileIO___init___impl() at Modules/_io/fileio.c:480

It's a corner case of the issue #7111, I would call it a race condition.
msg248806 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-08-18 23:46
Python 2 is not affected, PyFile_FromFile() doesn't check if the file descriptor is valid. On Python 3, we call fstat() to check the block size of the file descriptor to optimize buffering.
msg249514 - (view) Author: Yi Ding (Yi Ding) Date: 2015-09-01 20:07
I'm not sure this is a race condition. There's a crash every single time python is started when running the test.sh script under nohup.

If it's a race condition, it should only happen some of the time, not every single time.

Can you reeevaluate the priority?
msg249753 - (view) Author: Marco Paolini (mpaolini) * Date: 2015-09-04 11:05
Attaching a patch. Does it make any sense?
msg249754 - (view) Author: Marco Paolini (mpaolini) * Date: 2015-09-04 11:08
ops wrong patch... trying again.
msg249756 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-04 12:12
Yes, it looks like a good approach. Some comments:

- please mention this issue "Issue #24891"  and explain the issue in a
comment in create_stdio_checked()
- it would be safer to check the exception type, to be extra safe
- you must clear the exception
- why not putting the code in create_stdio() directly?
msg249760 - (view) Author: Marco Paolini (mpaolini) * Date: 2015-09-04 13:39
@haypo thanks for the quick review. This new issue24891_2.patch covers all of the points you raised except the "check exception type" which I am still figuring out.
msg249766 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-04 14:24
> This new issue24891_2.patch covers all of the points you raised except the "check exception type" which I am still figuring out.

See my patch issue24891_3.patch which calls PyErr_ExceptionMatches(PyExc_OSError). If you like it, I can push it to Python 3.4-3.6.
msg249770 - (view) Author: Marco Paolini (mpaolini) * Date: 2015-09-04 14:53
@haypo, yeah, definitely better than mine! All good for me.
msg249773 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-04 15:31
New changeset e67bf9c9a898 by Victor Stinner in branch '3.4':
Fix race condition in create_stdio()
https://hg.python.org/cpython/rev/e67bf9c9a898
msg249775 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-04 15:35
> @haypo, yeah, definitely better than mine! All good for me.

Ok. I added your name to Misc/ACKS. I had to do some tricks to apply the patch to Python 3.4 (code was in Python/pythonrun.c) and then to merge to Python 3.5 (code moved to Python/pylifecycle.c). But I checked the fix in each version using gdb:

* put a breakpoint on create_stdio,
* type "print close(0)" after the first is_valid_fd() check,
* see that the open() exception is correctly handled (type "next", "next", ... and check that we go to the error: label and then enter the if() block)

Thanks for your patch Marco.

I prefer to be extra safe by checking the raised exception to minimize the risk of regression.
msg249823 - (view) Author: Yi Ding (Yi Ding) Date: 2015-09-04 20:12
Thank you everyone! I will test the next rc of 3.5.
msg249824 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-04 20:23
I don't plan to send a pull request to the Python 3.5 release manager.
This bug existed since Python 3.0, the fix can wait for Python 3.5.1.
History
Date User Action Args
2015-09-04 20:23:28vstinnersetmessages: + msg249824
2015-09-04 20:12:01Yi Dingsetmessages: + msg249823
2015-09-04 15:35:10vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg249775
2015-09-04 15:31:53python-devsetnosy: + python-dev
messages: + msg249773
2015-09-04 14:53:07mpaolinisetmessages: + msg249770
2015-09-04 14:24:17vstinnersetfiles: + issue24891_3.patch

messages: + msg249766
2015-09-04 13:39:46mpaolinisetfiles: + issue24891_2.patch

messages: + msg249760
2015-09-04 12:12:55vstinnersetmessages: + msg249756
2015-09-04 11:08:57mpaolinisetfiles: - issue24891.patch
2015-09-04 11:08:46mpaolinisetfiles: + issue24891.patch

messages: + msg249754
2015-09-04 11:05:56mpaolinisetfiles: + issue24891.patch

nosy: + mpaolini
messages: + msg249753

keywords: + patch
2015-09-01 20:07:34Yi Dingsetmessages: + msg249514
2015-08-19 05:28:49vstinnersetcomponents: + Interpreter Core, - macOS
2015-08-19 05:28:29vstinnersettitle: python aborts running under nohup -> race condition in initstdio() (python aborts running under nohup)
2015-08-19 05:28:12vstinnersetpriority: normal -> low
2015-08-18 23:46:48vstinnersetmessages: + msg248806
versions: + Python 3.5, Python 3.6
2015-08-18 23:41:14vstinnersetmessages: + msg248805
2015-08-18 23:26:32Yi Dingsetmessages: + msg248804
2015-08-18 23:24:34Yi Dingsetmessages: + msg248803
2015-08-18 23:00:50vstinnersetnosy: + vstinner, ned.deily, ronaldoussoren
messages: + msg248801
components: + macOS
2015-08-18 22:53:14rbcollinssetnosy: + rbcollins
messages: + msg248800
2015-08-18 22:19:03Yi Dingsetfiles: + test.sh

messages: + msg248798
2015-08-18 22:18:48Yi Dingcreate