classification
Title: Fatal Python error: Py_Initialize: can't initialize sys standard streams
Type: crash Stage: resolved
Components: FreeBSD, IO Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: izbyshev, koobs, rudolphf, vstinner, Владислав Ярмак
Priority: normal Keywords: patch

Created on 2018-02-15 09:20 by rudolphf, last changed 2019-04-17 19:04 by rudolphf. This issue is now closed.

Files
File name Uploaded Description Edit
alles-23978 rudolphf, 2018-02-15 09:20 ktrace output
fstat0.c izbyshev, 2018-02-19 20:01
repro.c izbyshev, 2018-02-20 00:43
Pull Requests
URL Status Linked Edit
PR 5773 closed izbyshev, 2018-02-20 14:32
PR 12852 merged vstinner, 2019-04-16 09:56
PR 12863 merged miss-islington, 2019-04-17 16:09
Messages (17)
msg312194 - (view) Author: Rudolph Froger (rudolphf) Date: 2018-02-15 09:20
Sometimes a new Python 3.6.4 process is aborted by the kernel (FreeBSD 11.1) (before loading my Python files).

Found in syslog:

kernel: pid 22433 (python3.6), uid 2014: exited on signal 6 (core dumped)
Fatal Python error: Py_Initialize: can't initialize sys standard streams
OSError: [Errno 9] Bad file descriptor

I've been able to run ktrace on such a Python process, see attachment. See around line 940: "RET   fstat -1 errno 9 Bad file descriptor"
msg312282 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2018-02-17 15:42
ktrace shows that dup(0) succeeded but fstat(0) failed. The symptom is the same as in #30225. Could you check whether any of the following quick tests produces the same error?

python3 -c 'import os, subprocess, sys; r, w = os.pipe(); os.close(w); subprocess.call([sys.executable, "-c", ""], stdin=r)'

python3 -c 'import os, subprocess, sys; r, w = os.pipe(); os.close(r); subprocess.call([sys.executable, "-c", ""], stdin=w)'
msg312342 - (view) Author: Rudolph Froger (rudolphf) Date: 2018-02-19 07:49
I've tried your quick tests a few times but couldn't reproduce it immediately. The problem is a bit hard to reproduce anyway because launching Python processes can go well for a long time (many days; launching many processes every minute) until suddenly all NEW processes get aborted. It seems as if somehow something in the relation to the parent process goes wrong somehow. I've seen it happening with Python as the parent process but also with a plain shell process as the parent.
Just starting Python (python -c "x = 1") can be enough to trigger this so it's not something which can be blamed on some library.
msg312364 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2018-02-19 20:01
Thank you for checking. If this issue happens even when Python is run manually from an ordinary shell, fixing it in the same way as in #30225 is probably not what you want because while the error message will be gone the corresponding std stream will be None (sys.stdin in the case that you ktrace'd). However, if fd 0 really becomes unusable for some reason, there isn't anything Python can do.

Given your description and ktrace log, I can't imagine why fd 0 would behave strangely only in Python. I've attached a small C program to check fd 0. Could you compile it and run in an infinite loop from the shell in an attempt to reproduce this?
msg312387 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2018-02-20 00:43
OK, never mind with the test. I've finally got to a FreeBSD box and reproduced the problem. It has to do with 'revoke' feature of *BSD. When revoke is called on a terminal device (as part of logout process, for example), all descriptors associated with it are invalidated. They can be dup'ed, but any I/O (including fstat) will fail with EBADF. The attached 'repro.c' demonstrates the same behavior as Python in your ktrace log.

# sleep 5; ./repro >&err.txt &
# exit
(login again)
# cat err.txt
isatty: Inappropriate ioctl for device
dup ok: 3
fstat: Bad file descriptor

So it seems that in your case the parent of your Python processes passed a descriptor referring to the terminal as fd 0, and then terminal got revoked at some point. People have stumbled on that, for example, https://bitbucket.org/tildeslash/monit/issues/649/init_env-fails-if-open-2-returns-an

As for Python, it seems OK to fix it as in #30225 since the fd is unusable for I/O anyway. I think that we can even drop dup-based validation from is_valid_fd() since there is a corner case for Linux too: if a descriptor opened with O_PATH inherited as a standard one, dup() will succeed but fstat() will fail in kernels before 3.6. And we do fstat() almost immediately after is_valid_fd() to get blksize, so the dup-based optimization doesn't seem worth the trouble.

Victor, do you have an opinion on that?
msg312388 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2018-02-20 00:48
> I think that we can even drop dup-based validation from is_valid_fd()

For POSIX, that is. There is no fstat on Windows, and dup is probably OK there (or, even better, dup2(fd, fd) -- no need to close).
msg312402 - (view) Author: Rudolph Froger (rudolphf) Date: 2018-02-20 08:31
Thanks for all the research!

My crashing Python process is started by a shell process which is launched by the Freebsd daemon tool, this might explain why stdin in no longer valid. But I'm not sure why it can be solved, sometimes, by restarting the the daemon.
msg312418 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2018-02-20 14:29
> But I'm not sure why it can be solved, sometimes, by restarting the the daemon.

Could it be simply because daemon is respawned from a process that does have a valid stdin at the time of respawn?

Note that daemon has an option to redirect std streams to /dev/null.
msg312419 - (view) Author: Rudolph Froger (rudolphf) Date: 2018-02-20 14:36
> Could it be simply because daemon is respawned from a process that does have a valid stdin at the time of respawn?

Yes, that could certainly be the case. Thanks!
msg339338 - (view) Author: Владислав Ярмак (Владислав Ярмак) Date: 2019-04-02 14:48
I have similar crash with Python 3.7.2 on Linux.

Steps to reproduce: send sigint when Python initializes.

I've built debug version of Python 3.7.2 and collected core dump:

(gdb) thread apply all bt

Thread 1 (Thread 0x7f8f5ee67e80 (LWP 13285)):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f8f5dfe742a in __GI_abort () at abort.c:89
#2  0x0000559515870286 in fatal_error (prefix=prefix@entry=0x5595159837f0 <__func__.14264> "init_sys_streams", msg=msg@entry=0x559515983480 "can't initialize sys standard streams", status=-1) at Python/pylifecycle.c:2179
#3  0x0000559515871062 in _Py_FatalInitError (err=...) at Python/pylifecycle.c:2198
#4  0x000055951577d1f5 in pymain_init (pymain=pymain@entry=0x7ffe886aafc0) at Modules/main.c:3019
#5  0x000055951577d215 in pymain_main (pymain=pymain@entry=0x7ffe886aafc0) at Modules/main.c:3032
#6  0x000055951577d29a in _Py_UnixMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:3072
#7  0x00005595157763e9 in main (argc=<optimized out>, argv=<optimized out>) at ./Programs/python.c:15
msg340326 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-16 09:56
> ktrace shows that dup(0) succeeded but fstat(0) failed.

Aha, the problem is still the is_valid_fd() function:

    /* Prefer dup() over fstat(). fstat() can require input/output whereas
       dup() doesn't, there is a low risk of EMFILE/ENFILE at Python
       startup. */

The function has been fixed on macOS with:

#ifdef __APPLE__
    /* bpo-30225: On macOS Tiger, when stdout is redirected to a pipe
       and the other side of the pipe is closed, dup(1) succeed, whereas
       fstat(1, &st) fails with EBADF. Prefer fstat() over dup() to detect
       such error. */
    struct stat st;
    return (fstat(fd, &st) == 0);
#else

I see two options:

* Only use dup() on platforms when we know that dup() is enough to detect corner cases: Linux and Windows
* Force usage of fstat() on FreeBSD... But what about OpenBSD, NetBSD and other BSD variants?

I wrote attached PR 12852 to only use dup() on Linux and Windows.
msg340329 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-16 10:12
Alexey Izbyshev: "I think that we can even drop dup-based validation from is_valid_fd() since there is a corner case for Linux too: if a descriptor opened with O_PATH inherited as a standard one, dup() will succeed but fstat() will fail in kernels before 3.6. And we do fstat() almost immediately after is_valid_fd() to get blksize, so the dup-based optimization doesn't seem worth the trouble. Victor, do you have an opinion on that?"

I don't understand this case. I don't know O_PATH nor how to inherit such special file descriptor. Would you mind to elaborate?

man open:

       O_PATH (since Linux 2.6.39)
              Obtain a file descriptor that can be used for two  purposes:  to
              indicate a location in the filesystem tree and to perform opera‐
              tions that act purely at the file descriptor  level.   The  file
              itself  is not opened, and other file operations (e.g., read(2),
              write(2), fchmod(2), fchown(2), fgetxattr(2), ioctl(2), mmap(2))
              fail with the error EBADF.

In following C program, fd 0 is a file descriptor opened by O_PATH: dup(0) and fstat(0) both succeed, which is not surprising, it's a valid file descriptor.
---
#include <fcntl.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

#define O_PATH 010000000

int main(void)
{
    int path_fd;

    path_fd = open(".", O_PATH);

    if (dup2(path_fd, 0)) {
        perror("dup2");
    }

    int fd = dup(0);
    if (fd < 0)
        perror("dup");
    else {
        fprintf(stderr, "dup ok: %d\n", fd);
        close(fd);
    }

    struct stat st;
    if (fstat(0, &st) < 0) {
        perror("fstat");
    }
    else {
        printf("fstat ok\n");
    }

    return 0;
}
---
msg340412 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-17 16:09
New changeset 3092d6b2630e4d2bd200fbc3231c27a7cba4d6b2 by Victor Stinner in branch 'master':
bpo-32849: Fix is_valid_fd() on FreeBSD (GH-12852)
https://github.com/python/cpython/commit/3092d6b2630e4d2bd200fbc3231c27a7cba4d6b2
msg340413 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-17 16:30
New changeset b87a8073db73f9ffa96104e00c624052e34b11c7 by Victor Stinner (Miss Islington (bot)) in branch '3.7':
bpo-32849: Fix is_valid_fd() on FreeBSD (GH-12852) (GH-12863)
https://github.com/python/cpython/commit/b87a8073db73f9ffa96104e00c624052e34b11c7
msg340414 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-17 16:33
In short, Python 2.7 doesn't seem to be affected by fstat/dup issues.

Python 2.7 doesn't check if file descriptors 0, 1 and 2 at startup. Python 2 uses PyFile_FromFile() to create sys.stdin, sys.stdout and sys.stderr which create a "file" object. The function calls fstat(fd) but it ignores the error: fstat() is only used to fail if the fd is a directory.

Python 2.7 doesn't have the is_valid_fd() function.
msg340415 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-17 16:36
Thanks Rudolph Froger for the bug report: the issue is now fixed in 3.7 and master (future Python 3.8) branches. Sorry for the delay.

--

Alexey Izbyshev wrote PR 5773 to also use fstat() on Linux.

I chose to merge my PR 12852 which is more conservative: it keeps dup() on Linux. I'm not sure why exactly, but I recall that the author of the function, Antoine Pitrou, wanted to use dup() on Linux.

I'm not convinced by the O_PATH issue on Linux (described above), so I merged my conservative change instead.

Later, we can still move to fstat() on Linux as well if someone comes with a more concrete example against dup().
msg340437 - (view) Author: Rudolph Froger (rudolphf) Date: 2019-04-17 19:04
Thanks all for the fixes!
History
Date User Action Args
2019-04-17 19:04:09rudolphfsetmessages: + msg340437
2019-04-17 16:36:19vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg340415

stage: patch review -> resolved
2019-04-17 16:33:47vstinnersetmessages: + msg340414
2019-04-17 16:30:38vstinnersetmessages: + msg340413
2019-04-17 16:09:25vstinnersetmessages: + msg340412
2019-04-17 16:09:25miss-islingtonsetpull_requests: + pull_request12790
2019-04-16 10:12:31vstinnersetmessages: + msg340329
2019-04-16 09:56:57vstinnersetmessages: + msg340326
2019-04-16 09:56:39vstinnersetpull_requests: + pull_request12778
2019-04-02 14:48:52Владислав Ярмакsetnosy: + Владислав Ярмак
messages: + msg339338
2018-02-20 14:36:50rudolphfsetmessages: + msg312419
2018-02-20 14:32:15izbyshevsetkeywords: + patch
stage: patch review
pull_requests: + pull_request5552
2018-02-20 14:29:23izbyshevsetmessages: + msg312418
2018-02-20 08:31:15rudolphfsetmessages: + msg312402
2018-02-20 00:48:16izbyshevsetmessages: + msg312388
2018-02-20 00:43:03izbyshevsetfiles: + repro.c
versions: + Python 3.7
nosy: + koobs

messages: + msg312387

components: + IO, FreeBSD
2018-02-19 20:01:11izbyshevsetfiles: + fstat0.c

messages: + msg312364
2018-02-19 07:49:23rudolphfsetmessages: + msg312342
2018-02-17 15:42:02izbyshevsetnosy: + vstinner, izbyshev
messages: + msg312282
2018-02-15 09:28:51rudolphfsettype: crash
2018-02-15 09:20:30rudolphfcreate