Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal Python error: Py_Initialize: can't initialize sys standard streams #77030

Closed
rudolphfroger mannequin opened this issue Feb 15, 2018 · 17 comments
Closed

Fatal Python error: Py_Initialize: can't initialize sys standard streams #77030

rudolphfroger mannequin opened this issue Feb 15, 2018 · 17 comments
Labels
3.7 (EOL) end of life topic-IO type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@rudolphfroger
Copy link
Mannequin

rudolphfroger mannequin commented Feb 15, 2018

BPO 32849
Nosy @vstinner, @koobs, @izbyshev, @corona10, @rudolphfroger
PRs
  • bpo-32849: Always use fstat() to validate fds on POSIX #5773
  • bpo-32849: Fix is_valid_fd() on FreeBSD #12852
  • [3.7] bpo-32849: Fix is_valid_fd() on FreeBSD (GH-12852) #12863
  • bpo-45919: Use WinAPI GetFileType() in is_valid_fd() #30082
  • Files
  • alles-23978: ktrace output
  • fstat0.c
  • repro.c
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-04-17.16:36:19.673>
    created_at = <Date 2018-02-15.09:20:30.354>
    labels = ['3.7', 'expert-IO', 'type-crash']
    title = "Fatal Python error: Py_Initialize: can't initialize sys standard streams"
    updated_at = <Date 2021-12-13.12:52:34.496>
    user = 'https://github.com/rudolphfroger'

    bugs.python.org fields:

    activity = <Date 2021-12-13.12:52:34.496>
    actor = 'corona10'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-04-17.16:36:19.673>
    closer = 'vstinner'
    components = ['IO', 'FreeBSD']
    creation = <Date 2018-02-15.09:20:30.354>
    creator = 'rudolphf'
    dependencies = []
    files = ['47444', '47451', '47452']
    hgrepos = []
    issue_num = 32849
    keywords = ['patch']
    message_count = 17.0
    messages = ['312194', '312282', '312342', '312364', '312387', '312388', '312402', '312418', '312419', '339338', '340326', '340329', '340412', '340413', '340414', '340415', '340437']
    nosy_count = 6.0
    nosy_names = ['vstinner', 'koobs', 'izbyshev', 'corona10', 'rudolphf', '\xd0\x92\xd0\xbb\xd0\xb0\xd0\xb4\xd0\xb8\xd1\x81\xd0\xbb\xd0\xb0\xd0\xb2 \xd0\xaf\xd1\x80\xd0\xbc\xd0\xb0\xd0\xba']
    pr_nums = ['5773', '12852', '12863', '30082']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue32849'
    versions = ['Python 3.6', 'Python 3.7']

    @rudolphfroger
    Copy link
    Mannequin Author

    rudolphfroger mannequin commented Feb 15, 2018

    Sometimes a new Python 3.6.4 process is aborted by the kernel (FreeBSD 11.1) (before loading my Python files).

    Found in syslog:

    kernel: pid 22433 (python3.6), uid 2014: exited on signal 6 (core dumped)
    Fatal Python error: Py_Initialize: can't initialize sys standard streams
    OSError: [Errno 9] Bad file descriptor

    I've been able to run ktrace on such a Python process, see attachment. See around line 940: "RET fstat -1 errno 9 Bad file descriptor"

    @rudolphfroger rudolphfroger mannequin added the type-crash A hard crash of the interpreter, possibly with a core dump label Feb 15, 2018
    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Feb 17, 2018

    ktrace shows that dup(0) succeeded but fstat(0) failed. The symptom is the same as in bpo-30225. Could you check whether any of the following quick tests produces the same error?

    python3 -c 'import os, subprocess, sys; r, w = os.pipe(); os.close(w); subprocess.call([sys.executable, "-c", ""], stdin=r)'

    python3 -c 'import os, subprocess, sys; r, w = os.pipe(); os.close(r); subprocess.call([sys.executable, "-c", ""], stdin=w)'

    @rudolphfroger
    Copy link
    Mannequin Author

    rudolphfroger mannequin commented Feb 19, 2018

    I've tried your quick tests a few times but couldn't reproduce it immediately. The problem is a bit hard to reproduce anyway because launching Python processes can go well for a long time (many days; launching many processes every minute) until suddenly all NEW processes get aborted. It seems as if somehow something in the relation to the parent process goes wrong somehow. I've seen it happening with Python as the parent process but also with a plain shell process as the parent.
    Just starting Python (python -c "x = 1") can be enough to trigger this so it's not something which can be blamed on some library.

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Feb 19, 2018

    Thank you for checking. If this issue happens even when Python is run manually from an ordinary shell, fixing it in the same way as in bpo-30225 is probably not what you want because while the error message will be gone the corresponding std stream will be None (sys.stdin in the case that you ktrace'd). However, if fd 0 really becomes unusable for some reason, there isn't anything Python can do.

    Given your description and ktrace log, I can't imagine why fd 0 would behave strangely only in Python. I've attached a small C program to check fd 0. Could you compile it and run in an infinite loop from the shell in an attempt to reproduce this?

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Feb 20, 2018

    OK, never mind with the test. I've finally got to a FreeBSD box and reproduced the problem. It has to do with 'revoke' feature of *BSD. When revoke is called on a terminal device (as part of logout process, for example), all descriptors associated with it are invalidated. They can be dup'ed, but any I/O (including fstat) will fail with EBADF. The attached 'repro.c' demonstrates the same behavior as Python in your ktrace log.

    # sleep 5; ./repro >&err.txt &
    # exit
    (login again)
    # cat err.txt
    isatty: Inappropriate ioctl for device
    dup ok: 3
    fstat: Bad file descriptor

    So it seems that in your case the parent of your Python processes passed a descriptor referring to the terminal as fd 0, and then terminal got revoked at some point. People have stumbled on that, for example, https://bitbucket.org/tildeslash/monit/issues/649/init_env-fails-if-open-2-returns-an

    As for Python, it seems OK to fix it as in bpo-30225 since the fd is unusable for I/O anyway. I think that we can even drop dup-based validation from is_valid_fd() since there is a corner case for Linux too: if a descriptor opened with O_PATH inherited as a standard one, dup() will succeed but fstat() will fail in kernels before 3.6. And we do fstat() almost immediately after is_valid_fd() to get blksize, so the dup-based optimization doesn't seem worth the trouble.

    Victor, do you have an opinion on that?

    @izbyshev izbyshev mannequin added topic-IO 3.7 (EOL) end of life labels Feb 20, 2018
    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Feb 20, 2018

    I think that we can even drop dup-based validation from is_valid_fd()

    For POSIX, that is. There is no fstat on Windows, and dup is probably OK there (or, even better, dup2(fd, fd) -- no need to close).

    @rudolphfroger
    Copy link
    Mannequin Author

    rudolphfroger mannequin commented Feb 20, 2018

    Thanks for all the research!

    My crashing Python process is started by a shell process which is launched by the Freebsd daemon tool, this might explain why stdin in no longer valid. But I'm not sure why it can be solved, sometimes, by restarting the the daemon.

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Feb 20, 2018

    But I'm not sure why it can be solved, sometimes, by restarting the the daemon.

    Could it be simply because daemon is respawned from a process that does have a valid stdin at the time of respawn?

    Note that daemon has an option to redirect std streams to /dev/null.

    @rudolphfroger
    Copy link
    Mannequin Author

    rudolphfroger mannequin commented Feb 20, 2018

    Could it be simply because daemon is respawned from a process that does have a valid stdin at the time of respawn?

    Yes, that could certainly be the case. Thanks!

    @ghost
    Copy link

    ghost commented Apr 2, 2019

    I have similar crash with Python 3.7.2 on Linux.

    Steps to reproduce: send sigint when Python initializes.

    I've built debug version of Python 3.7.2 and collected core dump:

    (gdb) thread apply all bt

    Thread 1 (Thread 0x7f8f5ee67e80 (LWP 13285)):
    #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
    #1 0x00007f8f5dfe742a in __GI_abort () at abort.c:89
    #2 0x0000559515870286 in fatal_error (prefix=prefix@entry=0x5595159837f0 <func.14264> "init_sys_streams", msg=msg@entry=0x559515983480 "can't initialize sys standard streams", status=-1) at Python/pylifecycle.c:2179
    #3 0x0000559515871062 in _Py_FatalInitError (err=...) at Python/pylifecycle.c:2198
    #4 0x000055951577d1f5 in pymain_init (pymain=pymain@entry=0x7ffe886aafc0) at Modules/main.c:3019
    #5 0x000055951577d215 in pymain_main (pymain=pymain@entry=0x7ffe886aafc0) at Modules/main.c:3032
    #6 0x000055951577d29a in _Py_UnixMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:3072
    #7 0x00005595157763e9 in main (argc=<optimized out>, argv=<optimized out>) at ./Programs/python.c:15

    @vstinner
    Copy link
    Member

    ktrace shows that dup(0) succeeded but fstat(0) failed.

    Aha, the problem is still the is_valid_fd() function:

    /* Prefer dup() over fstat(). fstat() can require input/output whereas
       dup() doesn't, there is a low risk of EMFILE/ENFILE at Python
       startup. */
    

    The function has been fixed on macOS with:

    #ifdef __APPLE__
        /* bpo-30225: On macOS Tiger, when stdout is redirected to a pipe
           and the other side of the pipe is closed, dup(1) succeed, whereas
           fstat(1, &st) fails with EBADF. Prefer fstat() over dup() to detect
           such error. */
        struct stat st;
        return (fstat(fd, &st) == 0);
    #else

    I see two options:

    • Only use dup() on platforms when we know that dup() is enough to detect corner cases: Linux and Windows
    • Force usage of fstat() on FreeBSD... But what about OpenBSD, NetBSD and other BSD variants?

    I wrote attached PR 12852 to only use dup() on Linux and Windows.

    @vstinner
    Copy link
    Member

    Alexey Izbyshev: "I think that we can even drop dup-based validation from is_valid_fd() since there is a corner case for Linux too: if a descriptor opened with O_PATH inherited as a standard one, dup() will succeed but fstat() will fail in kernels before 3.6. And we do fstat() almost immediately after is_valid_fd() to get blksize, so the dup-based optimization doesn't seem worth the trouble. Victor, do you have an opinion on that?"

    I don't understand this case. I don't know O_PATH nor how to inherit such special file descriptor. Would you mind to elaborate?

    man open:

       O_PATH (since Linux 2.6.39)
              Obtain a file descriptor that can be used for two  purposes:  to
              indicate a location in the filesystem tree and to perform opera‐
              tions that act purely at the file descriptor  level.   The  file
              itself  is not opened, and other file operations (e.g., read(2),
              write(2), fchmod(2), fchown(2), fgetxattr(2), ioctl(2), mmap(2))
              fail with the error EBADF.
    

    In following C program, fd 0 is a file descriptor opened by O_PATH: dup(0) and fstat(0) both succeed, which is not surprising, it's a valid file descriptor.
    ---

    #include <fcntl.h>
    #include <stdio.h>
    #include <sys/stat.h>
    #include <sys/types.h>
    #include <unistd.h>
    
    #define O_PATH 010000000
    
    int main(void)
    {
        int path_fd;
        path_fd = open(".", O_PATH);
        if (dup2(path_fd, 0)) {
            perror("dup2");
        }
    
        int fd = dup(0);
        if (fd < 0)
            perror("dup");
        else {
            fprintf(stderr, "dup ok: %d\n", fd);
            close(fd);
        }
    
        struct stat st;
        if (fstat(0, &st) < 0) {
            perror("fstat");
        }
        else {
            printf("fstat ok\n");
        }
    
        return 0;
    }

    @vstinner
    Copy link
    Member

    New changeset 3092d6b by Victor Stinner in branch 'master':
    bpo-32849: Fix is_valid_fd() on FreeBSD (GH-12852)
    3092d6b

    @vstinner
    Copy link
    Member

    New changeset b87a807 by Victor Stinner (Miss Islington (bot)) in branch '3.7':
    bpo-32849: Fix is_valid_fd() on FreeBSD (GH-12852) (GH-12863)
    b87a807

    @vstinner
    Copy link
    Member

    In short, Python 2.7 doesn't seem to be affected by fstat/dup issues.

    Python 2.7 doesn't check if file descriptors 0, 1 and 2 at startup. Python 2 uses PyFile_FromFile() to create sys.stdin, sys.stdout and sys.stderr which create a "file" object. The function calls fstat(fd) but it ignores the error: fstat() is only used to fail if the fd is a directory.

    Python 2.7 doesn't have the is_valid_fd() function.

    @vstinner
    Copy link
    Member

    Thanks Rudolph Froger for the bug report: the issue is now fixed in 3.7 and master (future Python 3.8) branches. Sorry for the delay.

    --

    Alexey Izbyshev wrote PR 5773 to also use fstat() on Linux.

    I chose to merge my PR 12852 which is more conservative: it keeps dup() on Linux. I'm not sure why exactly, but I recall that the author of the function, Antoine Pitrou, wanted to use dup() on Linux.

    I'm not convinced by the O_PATH issue on Linux (described above), so I merged my conservative change instead.

    Later, we can still move to fstat() on Linux as well if someone comes with a more concrete example against dup().

    @rudolphfroger
    Copy link
    Mannequin Author

    rudolphfroger mannequin commented Apr 17, 2019

    Thanks all for the fixes!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life topic-IO type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants