classification
Title: os.posix_spawn errors with wrong information when shebang points to not-existing file
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.10
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, gregory.p.smith, hroncok, izbyshev, nanjekyejoannah, pablogsal, torsava, vstinner
Priority: normal Keywords:

Created on 2021-02-03 10:43 by torsava, last changed 2021-02-11 08:08 by hroncok. This issue is now closed.

Messages (21)
msg386187 - (view) Author: Tomas Orsava (torsava) * Date: 2021-02-03 10:43
os.posix_spawn fails with a wrong error information when executing an existing file with shebang pointing to a non-existing file.



$ cat demo
#!/usr/bin/hugo

$ ./demo
bash: ./demo: /usr/bin/hugo: bad interpreter: No such file or directory

$ cat repro.py
import os
os.posix_spawn("./demo", ["./demo"], {})

$ python3.10 repro.py
Traceback (most recent call last):
  File "/home/torsava/mess-old/2021-02/python-popen/repro.py", line 2, in <module>
    os.posix_spawn("./demo", ["./demo"], {})
FileNotFoundError: [Errno 2] No such file or directory: './demo'



The same problem exists when `demo` is on the PATH.



$ export PATH=".:$PATH"

$ demo
bash: ./demo: /usr/bin/hugo: bad interpreter: No such file or directory

$ cat repro_path.py
import os
os.posix_spawn("demo", ["demo"], {})

$ python3.10 repro_path.py
Traceback (most recent call last):
  File "/home/torsava/mess-old/2021-02/python-popen/repro_path.py", line 2, in <module>
    os.posix_spawn("demo", ["demo"], {})
FileNotFoundError: [Errno 2] No such file or directory: 'demo'
msg386188 - (view) Author: Miro Hrončok (hroncok) * Date: 2021-02-03 10:53
I don't think posix_spawn actually reads $PATH (hence the second example is pretty much doing the same as the first one), but this problem also manifests with subprocess (which does).
msg386189 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-02-03 10:55
os.posix_spawn() is a thin wrapper to posix_spawn(). Python doesn't try to change its behavior on purpose. So I don't think that this issue is a Python bug.
msg386190 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-02-03 10:56
If you want to look for the "demo" program in the PATH environment variable, use os.posix_spawnp() instead:
https://docs.python.org/dev/library/os.html#os.posix_spawnp
msg386191 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-02-03 11:00
> FileNotFoundError: [Errno 2] No such file or directory: './demo'

'./demo' filename is set with the following code in Modules/posixmodule.c:

    if (err_code) {
        errno = err_code;
        PyErr_SetFromErrnoWithFilenameObject(PyExc_OSError, path->object);
        goto exit;
    }

I understand that Tomas wants to raise the OSError with no filename.

I add Pablo and Joannah in the loop, they worked on exposing posix_spawn function in Python.
msg386193 - (view) Author: Miro Hrončok (hroncok) * Date: 2021-02-03 11:03
Ideally, the error would say:

FileNotFoundError: ./demo: /usr/bin/hugo: bad interpreter: No such file or directory
msg386201 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-02-03 13:16
> FileNotFoundError: ./demo: /usr/bin/hugo: bad interpreter: No such file or directory

Python has no knowledge of executable formats, shell or anything. It only calls posix_spawn() and raises an OSError on error.
msg386206 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2021-02-03 13:51
> Ideally, the error would say:

> FileNotFoundError: ./demo: /usr/bin/hugo: bad interpreter: No such file or directory

The kernel simply returns ENOENT on an attempt to execve() a file with non-existing hash-bang interpreter. The same occurs on an attempt to run a dynamically linked ELF executable with INTERP program header containing a non-existing path. And, of course, the same error is returned if the executable file itself doesn't exist, so there is no simple way to distinguish such cases.

Bash simply assumes[1] that if a file contains a hash-bang and the error from execve() is not recognized otherwise, it's a "bad interpreter".

Note that all of the above is completely unrelated to os.posix_spawn(): subprocess or os.execve() would produce the same message.

[1] https://git.savannah.gnu.org/cgit/bash.git/tree/execute_cmd.c?h=bash-5.1#n5854
msg386207 - (view) Author: Miro Hrončok (hroncok) * Date: 2021-02-03 13:52
That was "ideal" error message. If we don't have all the information, we cannot have the ideal error message. But we need to adapt the default error message to not be misleading. What about:

FileNotFoundError: [Errno 2] No such file or directory: Either './demo' or the interpreter of './demo' not found.
msg386210 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2021-02-03 14:16
> FileNotFoundError: [Errno 2] No such file or directory: Either './demo' or the interpreter of './demo' not found.

This doesn't sound good to me because a very probable and a very improbable reasons are combined together without any distinction. Another possible issue is that usage of the word "interpreter" in this narrow sense might be non-obvious for users.

ISTM that the most minimal way to check for the possibility of interpreter issue would be do something like `access(exe_path, X_OK)` in case of ENOENT: if it's successful, then a "bad interpreter" condition is likely. But in case of os.posix_spawnp(), the search in PATH is performed by libc, so CPython doesn't know exe_path. OTOH, subprocess and os.exec*p do perform their own search in PATH, but in case of subprocess it happens in the context of the child process, so we'll probably need to devise a separate error code to report to the parent via the error pipe to distinguish this condition.

So far, I'm not convinced that the result is worth it, but I do agree that such mysterious "No such file" errors are not user-friendly.
msg386217 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-02-03 15:53
IMO the fix is simple: only create OSError from the errno, never pass a filename.

posix_spawn() is really complex function which can fail in many different ways. Only in some very specific cases the filename is correct.

"""
ERRORS

       The posix_spawn() and posix_spawnp() functions fail only in the
       case where the underlying fork(2), vfork(2) or clone(2) call
       fails;  in these cases, these functions return an error number,
       which will be one of the errors described for fork(2), vfork(2)
       or clone(2).

       In addition, these functions fail if:

       ENOSYS Function not supported on this system.
"""

https://man7.org/linux/man-pages/man3/posix_spawn.3.html

Hum. I'm not sure that manual page is up to date. In the glic, it can also report exec() failure using a pipe, if I recall correctly.
msg386218 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2021-02-03 16:56
> IMO the fix is simple: only create OSError from the errno, never pass a filename.

This will remove a normally helpful piece of the error message in exchange to being marginally less confusing in a rare case of non-existing interpreter (the user will still be left wondering why the file they tried to run exists, but "No such file or directory" is reported). So the only "win" here would be for CPython developers because users will be less likely to report a bug like this one.

> posix_spawn() is really complex function which can fail in many different ways.

This issue is not specific to posix_spawn(): subprocess and os.execve() report the filename too. Any fix would need to change those too for consistency.
msg386486 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-02-04 17:12
I suggest making no change here, except maybe documenting it somewhere. Removing the filename would make this problem even harder to diagnose. And adding additional code to an error condition just increases the chance of something failing. The error reporting should be as simple as possible.
msg386493 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-02-04 19:42
That bash produces a nicer error message is because bash happens to implement its own special logic to try and figure out why an exec failed with an error other than ENOEXEC.  The OS kernel & libc do not give it that information, there is no such errno.  Bash inspects the executable itself for a #! in some error paths, extracts an interpreter from that and constructs an error message involving it.  (bash's execute_cmd.c and HAVE_HASH_BANG_EXEC logic)

I agree with Eric.  I don't think we should do anything here.  This isn't posix_spawn, subprocess, or os.exec* specific.  It's just how posix-y OSes that have the concept of a #! line interpreter for executable files work.  The errno that comes back from an exec failure is not super informative.

If someone disagrees, the way to move forward on this is to implement equivalent logic in a central place and have it called from all of the relevant places within the posixmodule (os) and the _posixsubprocess module.  With tests of all possible errno cases and code paths for each.  And make a PR out of that.

If you're going to take that on; do _not_ look at the bash source code.  That's GPL, we cannot copy it.  Just go by this description.

To me, this seems over complicated to get right and maintain.  I'd rather not do this within Python.  But if someone is going to make a PR for it, I'll at least look at it to see if it seems like something we could accept maintenance of.  I cannot guarantee we'd accept it.
msg386773 - (view) Author: Tomas Orsava (torsava) * Date: 2021-02-10 10:41
I agree that at least documenting the behaviour is a good idea. This bug has seriously confused our QE person with years of experience, and then me when debugging with him. Chances are it's going to confuse somebody else too.
msg386786 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2021-02-10 15:28
How do you propose to approach documentation of such behavior? The underlying cause is the ambiguity of ENOENT error code from execve() returned by the kernel, so it applies to all places where Python can call execve(), including os.posixspawn(), os.execve() and subprocess, so it's not clear to me where such documentation should be placed. And, of course, this behavior is not specific to CPython.

The Linux man pages mention various causes of this error[1], though POSIX doesn't[2].

While ENOENT ambiguity is indeed confusing, one of the top results of my DDG search on "linux no such file or directory but script exists" is this link[3].

[1] https://man7.org/linux/man-pages/man2/execve.2.html
[2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/execve.html
[3] https://stackoverflow.com/questions/3949161/no-such-file-or-directory-but-it-exists
msg386791 - (view) Author: Miro Hrončok (hroncok) * Date: 2021-02-10 17:00
Note that ENOENT is ambiguous, but the exception message is very specific about what file is not found. And it is not always correct.
msg386793 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-02-10 17:17
I agree with @hroncok, and maybe we could tweak the message to say

FileNotFoundError: [Errno 2] No such file or directory: while executing './demo'. Maybe bad shebang, or missing file?

Or something to that effect. I realize that listing all possible error reasons is a fool's errand, and there are cases where it might make things more confusing.

This reminds me of the old MS-DOS errors like "A duplicate file name exists, or the file cannot be found": as a user, I always wanted to scream "you know which one, tell me!". Sadly, you can't always get the OS to give you the info. Which is also like Windows "can't load DLL" errors: which one?!
msg386798 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2021-02-10 18:16
I generally agree, but getting a good, short error message seems to be the hard part here. I previously complained[1] about the following proposal by @hroncok:

FileNotFoundError: [Errno 2] No such file or directory: Either './demo' or the interpreter of './demo' not found.

But may be it's just me. Does anybody else feel that mentioning "the interpreter" is this way could be confusing in prevalent cases when the actual problem is missing './demo' itself? If we can come up with a good message, I can look into turning it into a PR.

The error message above also reads to me like there are no other possible reasons of ENOENT. On Linux, binfmt_misc[2] provides a way to run arbitrary code on execve(). This is used, for example, to transparently run binaries for foreign arches via qemu-user, so probably ENOENT would be returned if QEMU itself it missing. QEMU *may* be thought as a kind of interpreter here, though it's completely unrelated to a hash-bang or an ELF interpreter.

But I don't know about other POSIX platforms. As a theoretical example, if the dynamic library loader is implemented in the kernel, the system call could return ENOENT in case of a missing library. Do we need to worry about it? Does anybody know about the situation on macOS, where posix_spawn() is a system call too?

[1] https://bugs.python.org/issue43113#msg386210
[2] https://en.wikipedia.org/wiki/Binfmt_misc
msg386821 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-02-11 07:56
> Note that ENOENT is ambiguous, but the exception message is very specific about what file is not found. And it is not always correct.

If you want to be extra nice, you can try to check if the file exists. If it exists, add a message about the shebang.

The problem is that the code to generate the exception is quite generic, it creates an OSError exception object from an errno (int) and a filename (str). It's not trivial to change the exception message. Is it worth it?
msg386824 - (view) Author: Miro Hrončok (hroncok) * Date: 2021-02-11 08:08
At the very least, it's worth documenting in all places that can give you the exception, such as subprocess etc.

Whether it's worth to try to fix the exception message I don't really know. Confused users on one side, fragile complex heuristic on the other.

Honestly, I think it is worth it to at least try to brainstorm how to improve the message without making it needlessly complicated. I've attempted that with: Either demo or interpreter of demo. It's not perfect and maybe another file is missing. So let's go further:

FileNotFoundError: [Errno 2] No such file or directory while attempting to execute './demo'. This means './demo' or other files needed to run it don't exist.
History
Date User Action Args
2021-02-11 08:08:25hroncoksetmessages: + msg386824
2021-02-11 07:56:35vstinnersetmessages: + msg386821
2021-02-10 18:16:24izbyshevsetmessages: + msg386798
2021-02-10 17:17:26eric.smithsetmessages: + msg386793
2021-02-10 17:00:30hroncoksetmessages: + msg386791
2021-02-10 15:28:37izbyshevsetmessages: + msg386786
2021-02-10 10:41:40torsavasetmessages: + msg386773
2021-02-04 19:42:12gregory.p.smithsetstatus: open -> closed
type: behavior -> enhancement
messages: + msg386493

resolution: not a bug
stage: needs patch
2021-02-04 17:12:35eric.smithsetnosy: + eric.smith
messages: + msg386486
2021-02-03 16:56:28izbyshevsetnosy: + gregory.p.smith
messages: + msg386218
2021-02-03 15:53:03vstinnersetmessages: + msg386217
2021-02-03 14:16:46izbyshevsetmessages: + msg386210
2021-02-03 13:52:06hroncoksetmessages: + msg386207
2021-02-03 13:51:18izbyshevsetnosy: + izbyshev
messages: + msg386206
2021-02-03 13:16:05vstinnersetmessages: + msg386201
2021-02-03 11:03:37hroncoksetmessages: + msg386193
2021-02-03 11:00:35vstinnersetnosy: + pablogsal, nanjekyejoannah
messages: + msg386191
2021-02-03 10:56:56vstinnersetmessages: + msg386190
2021-02-03 10:55:40vstinnersetnosy: + vstinner
messages: + msg386189
2021-02-03 10:53:06hroncoksetmessages: + msg386188
2021-02-03 10:45:11hroncoksetnosy: + hroncok

title: os.posix_spawn errors with wrong information when shebang does not exist -> os.posix_spawn errors with wrong information when shebang points to not-existing file
2021-02-03 10:43:04torsavacreate