This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: os.execvpe() doesn't support surrogates in env
Type: Stage:
Components: Library (Lib), Unicode Versions: Python 3.1, Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, vstinner
Priority: normal Keywords: patch

Created on 2010-04-14 00:01 by vstinner, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
os_execvpe_surrogates-2.patch vstinner, 2010-04-16 01:00
Messages (9)
msg103100 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-14 00:01
It would be nice to support the PEP 383 (surrogateescape) for environment variables in os.execvpe(). Attached patch uses PyUnicode_AsEncodedString(val, Py_FileSystemDefaultEncoding, "surrogateescape") to encode an environment variable value.

I'm not sure that PyUnicode_AsEncodedString(val, Py_FileSystemDefaultEncoding, "surrogateescape") does always return a PyBytes object.

I not patched environment keys, but it might be useful.
msg103101 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-14 00:02
See also issue #4036.
msg103107 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-14 00:56
See also #8393.
msg103277 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-16 00:07
My patch doesn't work for types bytes and bytearray.

I noticed that py3k uses surrogateescape to encode environment variable values ;-)
msg103278 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-16 00:10
Other notes: Environment variable *names* use also surrogateescape "encoding". os.spawnve() and os.spawnvpe() should also be patched (the code should also be factorized).
msg103279 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-16 01:00
New version of the patch:
 - factorize code between execve(), spawnve() and spawnvpe()
 - support also surrogates in environment variable names
 - support bytes and bytearray (bytearray cannot be used as a dictionary key, but my patch supports it)
 - remove unrelated fix (my first patch contains a fix for os.system(), also about surrogates)

Because of the factorization, the error messages doesn't contain the function name anymore. spawnve() and spawnvpe() omit BEGINLIBPATH and ENDLIBPATH, as execve(): "that Would Confuse Programs if Passed On". I suppose that if execve() ignore them, spawn*e() should also ignore them.

I don't have an OS/2, so I'm unable to test my patch on this OS :-/

Note: The patch fixes also subprocess to support bytes and bytearray in the environment dictionary.
msg103459 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-18 00:10
Current code of execve() has a bug: it uses the length of the environment variable value in *characters* and not in *bytes* to allocate the "p" buffer. I remember that someone wrote a comment somewhere about that... The result is that the environment variable value is truncated by 1 byte.

Example (copy of http://dpaste.com/184803/):
-----------
$ cat test.py
#!/usr/bin/python
# -*- coding: utf-8 -*-

import os

env = {"VAR": "ćd"}
os.execve("test.sh", [], env)
$ cat test.sh
#!/bin/bash

declare -p VAR
$ python2.6 test.py
declare -x VAR="ćd"
$ python3.1 test.py
declare -x VAR="ć"
-----------
msg104055 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-23 21:45
Commited: r80421 (py3k), blocked in 3.1 (80422). The commit fixes also os.getenv() to support bytes environment name.
msg104175 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-25 22:40
I blocked the fix in Python 3.1 because it's non trivial and I prefer to avoid complex changes in Python 3.1. But then I realized that Python 3.1 has two bugs about environment variables.

It uses sys.getfilesystemencoding()+surrogateecape to decode variables and sys.getdefaultencoding()+strict to encode variables: the encoding is different!

It counts the number of *characters* to allocate the *byte* string buffer and so non-ASCII values are truncated.

So I decided to backport the fix: r80494.
History
Date User Action Args
2022-04-11 14:56:59adminsetgithub: 52638
2010-04-25 22:40:52vstinnersetmessages: + msg104175
2010-04-23 21:45:07vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg104055
2010-04-18 16:06:16Arfreversetnosy: + Arfrever
2010-04-18 00:10:33vstinnersetmessages: + msg103459
2010-04-16 01:16:56vstinnersetfiles: - os_execvpe_surrogates.patch
2010-04-16 01:00:29vstinnersetfiles: + os_execvpe_surrogates-2.patch

messages: + msg103279
2010-04-16 00:10:27vstinnersetmessages: + msg103278
2010-04-16 00:07:24vstinnersetmessages: + msg103277
2010-04-14 00:56:11vstinnersetmessages: + msg103107
2010-04-14 00:02:40vstinnerlinkissue8242 dependencies
2010-04-14 00:02:01vstinnersetmessages: + msg103101
2010-04-14 00:01:36vstinnercreate