classification
Title: [RFE] os.environ should support Path-like values, like subprocess(..., env=...)
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Antony.Lee, brett.cannon, eryksun, ethan.furman, rhettinger, serhiy.storchaka, terry.reedy, vstinner
Priority: normal Keywords:

Created on 2020-01-27 10:25 by Antony.Lee, last changed 2020-08-31 21:17 by ethan.furman. This issue is now closed.

Messages (20)
msg360750 - (view) Author: Antony Lee (Antony.Lee) * Date: 2020-01-27 10:25
As of Py3.8/Linux:

    In [1]: os.environ["foo"] = Path("bar")                                                                             
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-4-2827297496cb> in <module>
    ----> 1 os.environ["foo"] = Path("bar")

    ~/miniconda3/envs/default/lib/python3.8/os.py in __setitem__(self, key, value)
        676     def __setitem__(self, key, value):
        677         key = self.encodekey(key)
    --> 678         value = self.encodevalue(value)
        679         self.putenv(key, value)
        680         self._data[key] = value

    ~/miniconda3/envs/default/lib/python3.8/os.py in encode(value)
        746         def encode(value):
        747             if not isinstance(value, str):
    --> 748                 raise TypeError("str expected, not %s" % type(value).__name__)
        749             return value.encode(encoding, 'surrogateescape')
        750         def decode(value):

    TypeError: str expected, not PosixPath

    In [2]: subprocess.run('echo "$foo"', env={**os.environ, "foo": Path("bar")}, shell=True)                           
    bar
    Out[2]: CompletedProcess(args='echo "$foo"', returncode=0)

I guess it would be nice if it was possible to set os.environ entries to Path-like values, but most importantly, it seems a bit inconsistent that doing so is not possible on os.environ, but works when setting the `env` of a subprocess call.
msg360756 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-01-27 12:26
subprocess.Popen encodes the env parameter using os.fsencode(). os.fsencode() has been patched in bpo-27182 to use os.fspath():

commit c1cbeedf0c650c3f7c64f04479070d39e15e1baf
Author: Ethan Furman <ethan@stoneleaf.us>
Date:   Sat Jun 4 10:19:27 2016 -0700

    issue27182: update fsencode and fsdecode for os.path(); patch by Dusty Phillips

For os.environ, you can use:

    os.environ["foo"] = str(Path("bar"))

I don't know if os.environ is supposed to use the fspath protocol. I add Ethan and Brett to the nosy list.

fspath is specified by the PEP 519 "Adding a file system path protocol":
https://www.python.org/dev/peps/pep-0519/
msg360773 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2020-01-27 15:57
Adding `os.environ` support makes sense to me.
msg360775 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-01-27 16:04
Currently, os.environ behaves as a dictionary. When you put value into os.environ[key], os.environ[key] gives you back this value. If we start to convert value to a different type (convert something to str), it can be surprising.
msg360777 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2020-01-27 16:32
True, but so is having Path objects not seemlessly usable.

Also, isn't os.environ a special case where all values should be strings?
msg360779 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-01-27 16:43
Implicit conversion can also hide some bugs.

Semantically os.environ is an str to str mapping and os.environb is a bytes to bytes mapping.
msg360784 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2020-01-27 17:36
True, but this example of implicit conversion is only for Path objects which are currently implicitly converted throughout the stdlib where appropriate, and this looks like one of those appropriate places.

I don't know enough about os.environb to offer an opinion on auto-conversion there.
msg360788 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-01-27 18:45
Supporting __fspath__ for os.environ[b] makes it consistent with POSIX os.putenv, which already supports it via PyUnicode_FSConverter. For example:

    os.putenv(Path('spam'), Path('eggs'))
    getenv = ctypes.CDLL('libc.so.6').getenv
    getenv.restype = ctypes.c_char_p

    >>> getenv(b'spam')
    b'eggs'

For symmetry, os.putenv in Windows could be updated to support __fspath__. An old patch for bpo-28188 implements this, along with bytes support, via PyUnicode_FSDecoder. But bytes support would be limited to UTF-8 strings, so it wouldn't be as flexible as os.environb in POSIX, which supports arbitrary bytes except for embedded nulls.
msg360795 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2020-01-27 20:12
I don't think this should be done (and I honestly would have disagreed with the Popen change for its 'env' argument or any other place that is dealing with environment variables). Environment variables are strings, period, so they should be specified as such; explicit is better than implicit (and there isn't enough pragmatic benefit to override this in my opinion).
msg360798 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-01-27 20:41
> I don't think this should be done (and I honestly would have disagreed with the Popen change for its 'env' argument or any other place that is dealing with environment variables). Environment variables are strings, period, so they should be specified as such; explicit is better than implicit (and there isn't enough pragmatic benefit to override this in my opinion).

I don't think that subprocess accepts fspath on purpose. It uses fsencode() because it's convenient to encode Unicode to the filesystem encoding using the right error handler.

It's a side effect of the commit c1cbeedf0c650c3f7c64f04479070d39e15e1baf. In fact, I'm opposed to this commit, but now it's too change to change it again :-)

os.putenv() is unrelated to os.environ. Calling os.putenv() does not update os.environ, whereas setting a key in os.environ calls internally os.putenv().

I like the idea of keeping os.environ as simple as a dict mapping strings to trings (in term of API, internally it's more complicated).
msg360801 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-01-27 21:02
Path-like objects are now unintentionally accepted for many non-path things, like hostname in socket.sethostname() and username in os.initgroups(). I also think it was a mistake, and we should not make new mistakes.
msg360803 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-01-27 21:22
> I honestly would have disagreed with the Popen change for its 'env' 
> argument or any other place that is dealing with environment
> variables

os.spawnve (Windows) and os.execve support __fspath__ for the env  dict partially due to me (bpo-28114), so sorry if that was the wrong decision. However, at least on the POSIX side, I was restoring previous behavior that aligned with POSIX os.putenv. An example of what's supported currently:

Windows:

    >>> path = os.environ['comspec']
    >>> argv = ['cmd', '/c', 'echo %SPAM%']
    >>> env = os.environ.copy()
    >>> env[Path('SPAM')] = Path('eggs')
    >>> os.spawnve(0, path, argv, env)
    eggs
    0

POSIX:

    >>> path = '/bin/bash'
    >>> argv = ['bash', '-c', 'echo $SPAM']
    >>> env = os.environ.copy()
    >>> env[Path('SPAM')] = Path('eggs')
    >>> os.spawnve(0, path, argv, env)
    eggs
    0
msg360804 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2020-01-27 21:26
The idea behind PEP 519 was to alleviate str(path_obj) calls between the os/program interface.  We can either make that as consistent as we can as we find places that still require the str(path_obj) idiom, or we can make users remember which ones do, and which ones don't, easily use a Path object.

In my opinion, having to remember is a very unpleasant and frustrating user experience.

Serhiy Storchaka:
----------------
> Path-like objects are now unintentionally accepted for many non-path
> things, like hostname in socket.sethostname() and username in
> os.initgroups(). I also think it was a mistake, and we should not
> make new mistakes.

There will always be ways for users to make mistakes when using API's.  Whether they passed a Path path to os.sethostname() or a string path to os.sethostbyname(), it's still their bug, and linters/type-checkers exist to help catch those kinds of bugs.

Brett Canon:
-----------
> Environment variables are strings, period, so they should be specified
> as such;

Out of curiosity, any idea how often non-strings are used to set os.environ, and so need explicit conversion?
msg360890 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2020-01-28 18:35
> but now it's too change to change it again :-)

I actually don't think it is if we want to revert this. We can raise a deprecation warning if the call to os.fsencode() leads to a value different than its argument and say that people are expected to handle conversions themselves.

> The idea behind PEP 519 was to alleviate str(path_obj) calls between the os/program interface

... where file paths were explicitly expected. os.environ is not a place where file paths are explicitly expected, just a place where they _might_ end up. Basically I only consider PEP 519 valid in places where file paths are the only thing that are expected (e.g. all the path manipulation functions in os.path). Everywhere else where paths are only a "maybe" I would say it shouldn't be relied upon to do implicit conversions.

> Out of curiosity, any idea how often non-strings are used to set os.environ, and so need explicit conversion?

I have anecdotal evidence from outside of Python that people will do this when they can and then are surprised when the conversion doesn't work the way they expect (e.g. floats).
msg360894 - (view) Author: Antony Lee (Antony.Lee) * Date: 2020-01-28 18:44
FWIW, I'm actually fine with not supporting Path objects in os.environ, as long as the behavior is *consistent* with the env kwarg to subprocess.run() -- note that the original title of the thread only pointed out to the inconsistency, and did not strongly request that os.environ support Paths.
msg360901 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-01-28 21:19
> as long as the behavior is *consistent* with the env kwarg to 
> subprocess.run() 

subprocess isn't consistent with itself across platforms. The env parameter in Windows is strictly an str->str mapping, like os.environ. This is coded in getenvironment in Modules/_winapi.c:

    if (! PyUnicode_Check(key) || ! PyUnicode_Check(value)) {
        PyErr_SetString(PyExc_TypeError,
            "environment can only contain strings");
        goto error;
    }

At a lower level, should the env parameter of os.spawnve and os.execve  allow path-like objects? Should (Unix) os.putenv allow the name and value to be path-like?

Currently these cases use PyUnicode_FSConverter and PyUnicode_FSDecoder, which support __fspath__ via PyOS_FSPath. PyUnicode_FSDecoder also supports the buffer protocol, with a warning. 

Since we have cases like this where the filesystem encoding is used for string data that's not actually a file path, should alternate ParseTuple converters be added that are limited to just str and bytes? Maybe name them PyUnicode_FSStringEncoder and PyUnicode_FSStringDecoder, where "String" emphasizes that fspath and buffer objects are not allowed. For example:

    int
    PyUnicode_FSStringEncoder(PyObject *path, void *addr)
    {
        PyObject *output = NULL;

        if (path == NULL) {
            Py_DECREF(*(PyObject **)addr);
            *(PyObject **)addr = NULL;
            return 1;
        }

        if (PyBytes_Check(path)) {
            output = path;
            Py_INCREF(output);
        }
        else if (PyUnicode_Check(path)) {
            output = PyUnicode_EncodeFSDefault(path);
            if (!output)
                return 0;
        }
        else {
            PyErr_Format(PyExc_TypeError, "path should be str or bytes, not "
                "%.200s", _PyType_Name(Py_TYPE(path)));
            return 0;
        }

        if ((size_t)PyBytes_GET_SIZE(output) !=
                strlen(PyBytes_AS_STRING(output)))
        {
            PyErr_SetString(PyExc_ValueError, "embedded null byte");
            Py_DECREF(output);
            return 0;
        }
        
        *(PyObject **)addr = output;
        return Py_CLEANUP_SUPPORTED;
    }


    int
    PyUnicode_FSStringDecoder(PyObject *path, void *addr)
    {
        PyObject *output = NULL;

        if (arg == NULL) {
            Py_DECREF(*(PyObject **)addr);
            *(PyObject **)addr = NULL;
            return 1;
        }

        if (PyUnicode_Check(path)) {
            output = path;
            Py_INCREF(output);
        }
        else if (PyBytes_Check(path)) {
            output = PyUnicode_DecodeFSDefaultAndSize(PyBytes_AS_STRING(path),
                        PyBytes_GET_SIZE(path));
            if (!output)
                return 0;
        }
        else {
            PyErr_Format(PyExc_TypeError, "path should be str or bytes, not "
                "%.200s", _PyType_Name(Py_TYPE(path)));
            return 0;
        }
        
        if (PyUnicode_READY(output) == -1) {
            Py_DECREF(output);
            return 0;
        }
        
        if (findchar(PyUnicode_DATA(output), PyUnicode_KIND(output),
                     PyUnicode_GET_LENGTH(output), 0, 1) >= 0) {
            PyErr_SetString(PyExc_ValueError, "embedded null character");
            Py_DECREF(output);
            return 0;
        }
        
        *(PyObject **)addr = output;
        return Py_CLEANUP_SUPPORTED;
    }
msg360905 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-01-28 22:07
I don't think proposal this makes sense.  os.environ is a dict-like object providing clean access to the environment variables.  Use of Paths is an orthogonal problem unrelated to environment variables.
msg360927 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-01-29 01:55
I understand that os.fsencode() was modified to support the fspath protocol to be consistent with the C implementation PyUnicode_FSConverter() which calls PyOS_FSPath().

I agree that at least in C, we need two functions: one which accepts (str, bytes), another which also supports the fspath protocol.

IMHO PyUnicode_FSConverter() was modified to support fspath because it was convenient to only modify one function. But this change wasn't designed to decide in each function if fspath should be accepted or not.

For me, os.putenv() should not accept fspath neither.

This issue is not specific to os.environ. It's more general about the PEP 519 implementation. IMHO the implementation should be "adjusted".

I like the deprecation idea ;-) We did something similar with functions accepting float by mistake. First a deprecation warning was emited, and then it became a hard error (exception).
msg360929 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2020-01-29 02:01
Well, I would prefer if Path objects were seamless to use since at one time they were actually strings, but I can live with Brett's rationale that they are only seamless where paths are explicitly expected.
msg375750 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-08-21 11:27
os.environ is initially a copy of the os string-string mapping.  That some string values happen to represent file paths is opaque to the mapping.  So, to me, looking at os.environ by itself, there is no particular reason to autoconvert Path values but not anything else nor to autoconvert values but not keys.

If os.environ['key'] = Path('boo') worked, I would expect os.environ['key2'] = 333 to work.  I would consider a new inconsistency here to be worse than the existing one between os.environ and subprocess.Popen(env=...).  It would be OK with me if the latter were fixed.

It is not unheard of for CPython to accept objects that conceptually should not be.  Some instances have be 'grandparented', others not.
History
Date User Action Args
2020-08-31 21:17:29ethan.furmansetstatus: open -> closed
resolution: rejected
stage: resolved
2020-08-21 11:27:37terry.reedysetnosy: + terry.reedy
messages: + msg375750
2020-01-29 02:01:48ethan.furmansetmessages: + msg360929
2020-01-29 01:55:41vstinnersetmessages: + msg360927
2020-01-28 22:07:57rhettingersetnosy: + rhettinger
messages: + msg360905
2020-01-28 21:19:03eryksunsetmessages: + msg360901
2020-01-28 18:44:56Antony.Leesetmessages: + msg360894
2020-01-28 18:35:53brett.cannonsetmessages: + msg360890
2020-01-27 21:26:46ethan.furmansetmessages: + msg360804
2020-01-27 21:22:53eryksunsetmessages: + msg360803
2020-01-27 21:02:45serhiy.storchakasetmessages: + msg360801
2020-01-27 20:41:44vstinnersetmessages: + msg360798
2020-01-27 20:12:23brett.cannonsetmessages: + msg360795
2020-01-27 18:45:46eryksunsetnosy: + eryksun
messages: + msg360788
2020-01-27 17:36:34ethan.furmansetmessages: + msg360784
2020-01-27 16:43:54serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg360779
2020-01-27 16:32:49ethan.furmansetmessages: + msg360777
2020-01-27 16:04:06vstinnersetmessages: + msg360775
2020-01-27 15:57:30ethan.furmansetmessages: + msg360773
2020-01-27 12:26:30vstinnersettype: enhancement
title: os.environ does not support Path-like values, but subprocess(..., env=...) does -> [RFE] os.environ should support Path-like values, like subprocess(..., env=...)
2020-01-27 12:26:04vstinnersetnosy: + brett.cannon, vstinner, ethan.furman
messages: + msg360756
2020-01-27 10:25:58Antony.Leecreate