classification
Title: Create a bytes version of os.environ and getenvb()
Type: enhancement Stage: resolved
Components: Interpreter Core, Library (Lib), Unicode Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, ezio.melotti, gregory.p.smith, lemburg, loewis, pitrou, vstinner
Priority: normal Keywords: patch

Created on 2010-05-03 08:56 by vstinner, last changed 2010-09-10 22:19 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
os_environb-3.patch vstinner, 2010-05-06 13:03
Messages (38)
msg104825 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-03 08:56
As discussed in issue #8514, I propose a bytes version of os.envionb which would be synchronized with os.environ (which is possible thanks to surrogateescape error handler).

I also propose a os.getenvb()->bytes function.

I don't know yet if it's a good idea of not, but my patch accepts both bytes and str for os.environ(b).get() and os.getenv(b)().

antoine> In posixmodule.c, (...) if memory allocation of the bytes
antoine> object fails, we should error out.

I would require to change also the Windows version and the code specific to OS/2. Ok to do that, but after closing this issue ;-) I don't want to change to much things at the same time.
msg104827 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-03 09:10
The patch creates also fsencode()/fsdecode() functions proposed in #8514: I can rename them to use protected name (eg. "_encodeenv" and "_decodeenv") until we decided for these functions.
msg104838 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-05-03 13:55
A view comments on the patch:

+    def __init__(self, data, encodekey, decodekey, encodevalue, decodevalue, putenv, unsetenv):

As general guideline: When adding new parameter, please add them to the
end of the parameter list and preferably with a default argument in order
to not break the API.

Doesn't matter much in this case, since _Environ is only used internally,
but it's still good practice.

--

+data = {}
+for key, value in environ.items():
+    data[_keymap(key)] = fsencode(value)

Please put such init code into a function or make sure that the global
module space is not polluted with temporary variables such as data,
key, value.

--

+    # bytes environ
+    environb = _Environ(data, _keymap, fsencode, fsencode, fsencode, _putenv, _unsetenv)

This looks wrong even though it will work, but that's only a
side-effect of how fsencode is coded and that's not how the
stdlib should be coded (see below).

--

+    def fsencode(value):
+        """
+        unicode to file system
+        """
+        if isinstance(value, bytes):
+            return value
+        else:
+            return value.encode(sys.getfilesystemencoding(), 'surrogateescape')

The function should not accept bytes as input or at least
document this pass-through behavior, leaving the user to decide
whether that's wanted or not.

--

The patch is also missing code which keeps the two dictionaries in
sync. If os.environ changes, os.environb would have to change as
well.
msg104839 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-03 14:25
> The patch is also missing code which keeps the two dictionaries in
> sync. If os.environ changes, os.environb would have to change as
> well.

No, it doesn't :-) os.environ and os.environb are synchronized and there is a test for this! ;-)

I will see later for your other comments.
msg104871 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-05-03 19:33
Can somebody please explain what problem is being solved with this patch?
msg104877 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-05-03 20:27
Martin v. Löwis wrote:
> 
> Martin v. Löwis <martin@v.loewis.de> added the comment:
> 
> Can somebody please explain what problem is being solved with this patch?

The way os.environ is currently set up on Unix breaks applications
using the environment to pass data to helper applications.

Please see the discussion on http://bugs.python.org/issue8514
for details.
msg104878 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-05-03 20:30
> Please see the discussion on http://bugs.python.org/issue8514
> for details.

I can't see any report of actual breakage in that report, only claims of
potential breakage (with no supporting examples)
msg104879 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-03 20:36
An issue was opened 2 years ago: "It was brought up in a discussion of sending non-ASCii data to a CGI-WSGI script where the data would be transferred via os.environ." => #4006 (closed as "wont fix").
msg104880 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-05-03 20:45
> An issue was opened 2 years ago: "It was brought up in a discussion
> of sending non-ASCii data to a CGI-WSGI script where the data would
> be transferred via os.environ." => #4006 (closed as "wont fix").

Fortunately, that issue could now be reconsidered as "fixed"; the
example in the report (msg74118) now works correctly - thanks to PEP
383. So I still fail to see the problem.
msg104883 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-05-03 21:04
Martin v. Löwis wrote:
> 
> Martin v. Löwis <martin@v.loewis.de> added the comment:
> 
>> Please see the discussion on http://bugs.python.org/issue8514
>> for details.
> 
> I can't see any report of actual breakage in that report, only claims of
> potential breakage (with no supporting examples)

Set your CODESET to ASCII and watch the surrogate escaping
begin... seriously, Martin, if you've ever worked with CGI
or WSGI or FastCGI or SCGI or any of the many other protocols
that use the OS environment for passing data between processes,
it doesn't take much imagination to come up with examples
that fail left and right.

Here's one (RFC 3875, sections 4.1.7 and 4.1.5):

LANG = 'en_US.utf8'
CONTENT_TYPE = 'application/x-www-form-urlencoded'
QUERY_STRING = 'type=example&name=Löwis'
PATH_INFO = '/home/löwis/bin/mycgi.py'

(HTML uses Latin-1 as default encoding and so do many of the
 protocols invented for it !)

The file system encoding simply doesn't relate to the OS
environment at all - it's just a collection of name=value mappings
with no explicit encoding information. It may be a good guess,
but that's it.
msg104884 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-05-03 21:10
> Set your CODESET to ASCII and watch the surrogate escaping
> begin... seriously, Martin, if you've ever worked with CGI
> or WSGI or FastCGI or SCGI or any of the many other protocols
> that use the OS environment for passing data between processes,
> it doesn't take much imagination to come up with examples
> that fail left and right.
> 
> Here's one (RFC 3875, sections 4.1.7 and 4.1.5):
> 
> LANG = 'en_US.utf8'
> CONTENT_TYPE = 'application/x-www-form-urlencoded'
> QUERY_STRING = 'type=example&name=Löwis'
> PATH_INFO = '/home/löwis/bin/mycgi.py'

I still don't see a *failure* here. AFAICT, it all works correctly.
In particular, I fail to see the advantage of using bytes over using
escaped strings, in terms of correctness. I'm even skeptical that there
is an advantage in terms of usability (and if there is, I'd like to see
a demonstration of that, as well).
msg104886 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-05-03 21:30
> Here's one (RFC 3875, sections 4.1.7 and 4.1.5):
> 
> LANG = 'en_US.utf8'
> CONTENT_TYPE = 'application/x-www-form-urlencoded'
> QUERY_STRING = 'type=example&name=Löwis'
> PATH_INFO = '/home/löwis/bin/mycgi.py'
> 
> (HTML uses Latin-1 as default encoding and so do many of the
>  protocols invented for it !)

BTW, I think you are misinterpreting the RFC. It doesn't actually say
that QUERY_STRING is Latin-1 encoded, but instead, it says

"the details of the parsing, reserved characters and support for non
US-ASCII characters depends on the context"

Latin-1 is only given as a possible example. Apache passes the URL from
the HTTP request unescaped; browsers will likely CGI-escape it. So most
likely, it will be

QUERY_STRING = 'type=example&name=L%F6wis'
or
QUERY_STRING = 'type=example&name=L%C3%B6wis'

IMO, applications are much better off to consider QUERY_STRING as a
character string.
msg104889 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-05-03 22:02
Martin v. Löwis wrote:
> 
> Martin v. Löwis <martin@v.loewis.de> added the comment:
> 
>> Set your CODESET to ASCII and watch the surrogate escaping
>> begin... seriously, Martin, if you've ever worked with CGI
>> or WSGI or FastCGI or SCGI or any of the many other protocols
>> that use the OS environment for passing data between processes,
>> it doesn't take much imagination to come up with examples
>> that fail left and right.
>>
>> Here's one (RFC 3875, sections 4.1.7 and 4.1.5):
>>
>> LANG = 'en_US.utf8'
>> CONTENT_TYPE = 'application/x-www-form-urlencoded'
>> QUERY_STRING = 'type=example&name=Löwis'
>> PATH_INFO = '/home/löwis/bin/mycgi.py'
> 
> I still don't see a *failure* here. AFAICT, it all works correctly.

Your name will end up being partially escaped as surrogate:

'L\udcf6wis'

Further processing will fail, since the application would
correctly assume that the data is Latin-1 only (see the RFC):

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character '\udcf6' in position 1: ordinal not in
range(256)

> In particular, I fail to see the advantage of using bytes over using
> escaped strings, in terms of correctness. I'm even skeptical that there
> is an advantage in terms of usability (and if there is, I'd like to see
> a demonstration of that, as well).

The use of the 'surrogateescape' error handler modifies the
encoding used for the decoding of the bytes data and does this
implicitly.

This works fine as long as the data is only used *as reference* to
some entity (e.g. as in a file name) and manipulation of that
data is limited to concatenation and slicing. Things that you do
with file names and paths.

It doesn't work if an application tries to work *with* the data,
e.g. tries to convert it, parse it, decode it, etc. The reason is
that information included by the use of the 'surrogateescape'
error handler is lost along the way and this then causes data
corruption.
msg104894 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-05-03 22:11
> Your name will end up being partially escaped as surrogate:
> 
> 'L\udcf6wis'
> 
> Further processing will fail

That depends on the further processing, no?

> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'latin-1' codec can't encode character '\udcf6' in position 1: ordinal not in
> range(256)

Where did you get this error from?

> It doesn't work if an application tries to work *with* the data,
> e.g. tries to convert it

Converting it to what?

> parse it

Parsing will work fine.

> decode it

It's a string. You shouldn't decode it.

> The reason is
> that information included by the use of the 'surrogateescape'
> error handler is lost along the way and this then causes data
> corruption.

And how would that not happen if it was bytes? The problems you describe
were one of the primary motivations to switch to Unicode: it's *byte*
strings that have these problems.
msg104895 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-03 22:20
> Can somebody please explain what problem is being solved with this patch?

The problem is that we don't have reliable algorithm to get the encoding of each environment variable. We cannot ensure that all variables are encoded "correctly".

Using os.getenvb(), you can decode the string using the right encoding (which may be different for each variable).

Marc proposed os.getenv(key, encoding=...) but I don't like this solution: you have to split correctly all unicode things and all bytes things. You should have the choice to keep environment unchanged, as byte strings, and manipulate only byte strings, or to use the classic API (unicode only, os.getenv, os.environ(), ....).

os.environb and os.getenvb() will be required to applications in real world application supporting all applications and misconfigured environments. Python3 shouldn't try to fix misconfigured systems but leave this problem to the developer.
msg104897 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-05-03 22:36
> Using os.getenvb(), you can decode the string using the right
> encoding (which may be different for each variable).

Ok. If that's the motivation, the documentation should make that
clear (there isn't any documentation in the patch, anyway). I'm
worried that people start processing the bytes as-is (i.e. without
decoding them), and then start complaining that this and that library
doesn't support bytes. Then they start complaining that you can't
mix bytes and Unicode...

I also worry that people won't get it right any better than Python:
when they find that it doesn't work, they ask what to do, and people
will tell them decode with "iso-8859-1" (say). Then they do that,
and end up in moji-bake the next day.
msg104898 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-05-03 22:38
Martin v. Löwis wrote:
> 
> Martin v. Löwis <martin@v.loewis.de> added the comment:
> 
>> Here's one (RFC 3875, sections 4.1.7 and 4.1.5):
>>
>> LANG = 'en_US.utf8'
>> CONTENT_TYPE = 'application/x-www-form-urlencoded'
>> QUERY_STRING = 'type=example&name=Löwis'
>> PATH_INFO = '/home/löwis/bin/mycgi.py'
>>
>> (HTML uses Latin-1 as default encoding and so do many of the
>>  protocols invented for it !)
> 
> BTW, I think you are misinterpreting the RFC. It doesn't actually say
> that QUERY_STRING is Latin-1 encoded, but instead, it says
> 
> "the details of the parsing, reserved characters and support for non
> US-ASCII characters depends on the context"

Please read on:

"""
   For example, form submission from an HTML
   document [18] uses application/x-www-form-urlencoded encoding, in
   which the characters "+", "&" and "=" are reserved, and the ISO
   8859-1 encoding may be used for non US-ASCII characters.
"""

I could have also given you an example using 'multipart/form-data'
in which each part uses a different encoding or even sends binary
data by means of 'Content-Transfer-Encoding: binary'

These are not made up examples, they do occur in the real world for
which we are coding.

> Latin-1 is only given as a possible example. Apache passes the URL from
> the HTTP request unescaped; browsers will likely CGI-escape it. So most
> likely, it will be
> 
> QUERY_STRING = 'type=example&name=L%F6wis'
> or
> QUERY_STRING = 'type=example&name=L%C3%B6wis'
>
> IMO, applications are much better off to consider QUERY_STRING as a
> character string.

Believe me, I've been working with HTML, forms, web apps, etc.
for almost 20 years now. In the real world, your application has
to cope with any kind of data in QUERY_STRING.

And this is just one example of how the OS environment can
be used, e.g. to provide the user meta-data, license data,
company names.

Even if these all use UTF-8, a user might still want to stick
to ASCII as her CODESET and then all her Python application would
start to fail at first sight of a French accent or German
Umlaut.

PEP 383 is nice for file names and paths, but it's unfortunately
not going to save the world...
msg104900 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-03 22:51
> I also worry that people won't get it right any better than Python (...)

Developers coming from Python2 will continue to use os.getenv() and will not worry about encoding, and maybe not notice that the result is now unicode (and not more a byte string).

I think that a developer will only switch to os.getenvb() if he/she has troubles with the encodings.
msg104901 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-05-03 22:54
> I think that a developer will only switch to os.getenvb() if he/she
> has troubles with the encodings.

That's indeed a positive feature of this proposed change.
msg104913 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-05-04 08:51
Martin v. Löwis wrote:
> 
> Martin v. Löwis <martin@v.loewis.de> added the comment:
> 
>> Your name will end up being partially escaped as surrogate:
>>
>> 'L\udcf6wis'
>>
>> Further processing will fail
> 
> That depends on the further processing, no?
> 
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> UnicodeEncodeError: 'latin-1' codec can't encode character '\udcf6' in position 1: ordinal not in
>> range(256)
> 
> Where did you get this error from?

The roundup email interface must have eaten this
first line of the traceback: >>> _.encode('latin-1')

>> It doesn't work if an application tries to work *with* the data,
>> e.g. tries to convert it
> 
> Converting it to what?
> 
>> parse it
> 
> Parsing will work fine.
> 
>> decode it
> 
> It's a string. You shouldn't decode it.
>
>> The reason is
>> that information included by the use of the 'surrogateescape'
>> error handler is lost along the way and this then causes data
>> corruption.
> 
> And how would that not happen if it was bytes? The problems you describe
> were one of the primary motivations to switch to Unicode: it's *byte*
> strings that have these problems.

Martin, it's obvious that you are not even trying to understand
what I'm saying. That's not a good basis for discussion.
msg104920 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-04 10:49
New version of my patch, which looks much better. Summary:

  Issue #8603: Create os.environb and os.getenvb() on POSIX system.
  os.unsetenv() encodes str argument using file system encoding and
  surrogateescape error handler (instead of utf8/strict), and accept bytes.

Changes with my previous patch:

 - update os module documentation
 - os.getenv() only accepts str, os.getenvb() only accepts bytes (to avoid mojibake)
 - fix test_environb() of test_os for ASCII locale
 - fix os.putenv() if key is an unicode string: use the string encoded to bytes as key for posix_putenv_garbage
 - _Envion.__setitem__() encodes the key and value before calling putenv()
 - _Environ.__delitem__() encodes the key before calling unsetenv()
 - create a temporary function to create os.environ (to use temporary variables like encode, decode, keymap, data)
 - annotation types in os.getenv() and os.getenvb()
 - remove fsencode()/fsdecode() from the patch and don't touch os._execvpe() (will be fixed in other issues)
 - putenv() uses PyBytes_GET_SIZE() instead of strlen()
msg104987 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-05 00:19
@loewis: So do you agree to add os.environb and os.getenvb()?

The documentation of os.environb and os.getenvb() in my last patch is very short. I'm not inspired.

We told me on IRC to not use function annotations because annotation semantic was not decided yet.

I will try to improve the documentation and remove the annotations in my next patch.
msg105003 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-05-05 07:59
STINNER Victor wrote:
> 
> The documentation of os.environb and os.getenvb() in my last patch is very short. I'm not inspired.
> 
> We told me on IRC to not use function annotations because annotation semantic was not decided yet.
> 
> I will try to improve the documentation and remove the annotations in my next patch.

Patch looks good. +1 on adding it.

One nit: I'd rename the keymap function to encodekey.
msg105006 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-05 09:00
MaL> Patch looks good. +1 on adding it.

Cool. I didn't understood if MvL is +1, but at least he's not -1 on this, and 
we are at least two at +1 :-)

MaL> One nit: I'd rename the keymap function to encodekey.

Ok, I will also change that in the final patch.
msg105080 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-05-05 19:43
> @loewis: So do you agree to add os.environb and os.getenvb()?

I agree with the patch (-2) in principle. I think the error handling
needs to improve:

py> os.getenvb('PATH')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/martin/work/3k/Lib/os.py", line 484, in getenvb
    return environb.get(key, default)
  File "/home/martin/work/3k/Lib/_abcoll.py", line 357, in get
    return self[key]
  File "/home/martin/work/3k/Lib/os.py", line 400, in __getitem__
    value = self.data[self.encodekey(key)]
TypeError: string argument without an encoding

which then leads to the natural, but incorrect

py> os.getenvb('PATH', encoding='ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: getenvb() got an unexpected keyword argument 'encoding'

The first error should remain TypeError, but say something like
"unsupported type string", or "bytes expected".

I notice an incompatible change: posix.environ has now a different
element type. This is probably fine.

There is a couple of white-space only changes in the patch; it would be
good if you could reduce them.
msg105115 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-06 00:46
> There is a couple of white-space only changes in the patch; 
> it would be good if you could reduce them.

"When two paths open to you, you should always choose the most difficult" (in french: "Quand deux chemins s'ouvrent à nous, il faut toujours choisir le plus difficile"). I fixed posixmodule.c instead of my patch :-)
msg105116 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-06 00:48
> I notice an incompatible change: posix.environ has now a
> different element type. This is probably fine.

I don't understand, what is an "element type"?
msg105124 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-05-06 04:52
>> I notice an incompatible change: posix.environ has now a
>> different element type. This is probably fine.
> 
> I don't understand, what is an "element type"?

In a container, the contents is sometimes called "elements"; their
type is the element type. Currently, keys and values are strings;
with the change, they are bytes.
msg105131 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-06 08:50
Aaaah, *posix*.environ, not *os*.environ, ok. I will fix posix documentation. I didn't knew this dictionary :-)
msg105142 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-06 13:03
Patch version 3:
 - update posix documentation
 - improve os.environ and os.getenv() documentation: specify the type and document the encoding/error handler, add a link to environb/getenvb
 - os.environ and os.environb now check the argument types (raise a better error), on Windows and Unix. Before my patch, os.environ[b'key']=b'value' sets the variable "b'key'" to "b'value'" :-(
 - restore os.environb in TestEnviron.tearDown() (test_os)
 - fix patch on posixmodule.c indentation
 - fix a regression introduced by my patch: set PyErr_NoMemory() on error  (newstr==NULL in posixmodule.c)
 - rename keymap to encodekey
msg105143 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-06 13:09
Oh no, I forgot to remove the annotations from getenv() and getenvb() in os.py. I only removed them from the documentation.
msg105167 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-06 22:07
Commited as r80885 (py3k), blocked in 3.1 (r80886).

Thank you Martin and Marc for your great help and all your reviews ;-)
msg111878 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-07-29 01:00
FWIW os.environb is missing from os.__all__.
msg111880 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-07-29 01:08
A quick search[0] also shows that environ.data is used by several projects. Changing it from str to bytes will most likely break these programs, so I'm not sure it's a good idea.
Also, if I understand correctly, on Windows os.environ.data still contains str, so it's inconsistent with other systems.
IMHO os.environ.data should contain str, whereas os.environb.data bytes. If they must share the same data I would prefer them to be both str.

[0]: http://www.google.com/codesearch?q=lang%3Apython+environ.data&hl=en
msg111897 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-07-29 07:56
Ezio Melotti wrote:
> 
> Ezio Melotti <ezio.melotti@gmail.com> added the comment:
> 
> A quick search[0] also shows that environ.data is used by several projects. Changing it from str to bytes will most likely break these programs, so I'm not sure it's a good idea.
> Also, if I understand correctly, on Windows os.environ.data still contains str, so it's inconsistent with other systems.
> IMHO os.environ.data should contain str, whereas os.environb.data bytes. If they must share the same data I would prefer them to be both str.
>
> [0]: http://www.google.com/codesearch?q=lang%3Apython+environ.data&hl=en

Direct use of os.environ.data is not permitted as it is not a documented
feature, so I wouldn't worry about this.

The few uses you found are easy to fix.
msg111972 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-07-29 17:11
Le jeudi 29 juillet 2010 03:08:27, Ezio Melotti a écrit :
> A quick search[0] also shows that environ.data is used by several projects.

_Environ is a wrapper on data: call putenv() when a value is changed and 
unputenv() when a value is removed. Since os.environb, it does also convert 
key and value types. So .data should not be used directly.

It looks like a common usage of .data is to get get to "a real dict" (and not 
"an user dict"). I think that os.environ.data or os.environ.data.copy() can be 
replaced by dict(os.environ) is that case.

> Changing it from str to bytes will most likely break these programs,

I'm not sure of that. It looks (in the search result) that os.environ.data is 
just stored and passed to a function, but not used as a dictionary to get a 
value (well, I'm not sure because it's hard to say with just 3 lines of the 
program). In that case, it doesn't matter if the dictionary contains byte or 
unicode strings.

Eg. zope2instance:

  old_env = os.environ.data.copy()
  ...
  os.environ.data = old_env

It does still work with bytes.

> so I'm not sure it's a good idea

Yes, it's not a good idea to use .data :-) This attribute should be protected, 
not public (even if it is not documented).

> IMHO os.environ.data should contain str, whereas os.environb.data
> bytes.

data is shared between os.environ and os.environb, because data is the 
solution to synchronize both dictionaries.

> If they must share the same data I would prefer them to be both
> str.

No, data should use the native type: bytes on UNIX and BSD, str on Windows.

--

If you still consider that the change on .data as a bug, I think that the fix 
is to remove .data (mark it as protected: environ.data => environ._data).
msg111976 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-07-29 17:20
Le jeudi 29 juillet 2010 03:00:03, Ezio Melotti a écrit :
> FWIW os.environb is missing from os.__all__.

Fixed by r83237
msg116050 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-10 22:19
> If you still consider that the change on .data as a bug, 
> I think that the fix is to remove .data (mark it as 
> protected: environ.data => environ._data).

r84690 marks os.environ.data as protected. Close this issue again.
History
Date User Action Args
2010-09-10 22:19:03vstinnersetstatus: open -> closed

messages: + msg116050
2010-07-29 17:20:30vstinnersetmessages: + msg111976
2010-07-29 17:11:45vstinnersetmessages: + msg111972
2010-07-29 07:56:22lemburgsetmessages: + msg111897
2010-07-29 01:08:25ezio.melottisetstatus: closed -> open

messages: + msg111880
2010-07-29 01:00:00ezio.melottisettype: enhancement
messages: + msg111878
stage: resolved
2010-05-06 22:07:16vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg105167
2010-05-06 21:15:26vstinnersetfiles: - os_environb-2.patch
2010-05-06 21:15:23vstinnersetfiles: - os_environb.patch
2010-05-06 13:09:43vstinnersetmessages: + msg105143
2010-05-06 13:03:47vstinnersetfiles: + os_environb-3.patch

messages: + msg105142
2010-05-06 08:50:01vstinnersetmessages: + msg105131
2010-05-06 04:52:21loewissetmessages: + msg105124
2010-05-06 00:48:57vstinnersetmessages: + msg105116
2010-05-06 00:46:46vstinnersetmessages: + msg105115
2010-05-05 19:43:22loewissetmessages: + msg105080
2010-05-05 09:00:03vstinnersetmessages: + msg105006
2010-05-05 07:59:46lemburgsetmessages: + msg105003
2010-05-05 00:19:59vstinnersetmessages: + msg104987
2010-05-04 11:10:40vstinnerlinkissue8513 dependencies
2010-05-04 10:49:49vstinnersetfiles: + os_environb-2.patch

messages: + msg104920
2010-05-04 08:51:18lemburgsetmessages: + msg104913
2010-05-03 22:54:29loewissetmessages: + msg104901
2010-05-03 22:51:46vstinnersetmessages: + msg104900
2010-05-03 22:38:16lemburgsetmessages: + msg104898
2010-05-03 22:36:17loewissetmessages: + msg104897
2010-05-03 22:20:48vstinnersetmessages: + msg104895
2010-05-03 22:11:20loewissetmessages: + msg104894
2010-05-03 22:02:06lemburgsetmessages: + msg104889
2010-05-03 21:30:59loewissetmessages: + msg104886
2010-05-03 21:10:40loewissetmessages: + msg104884
2010-05-03 21:04:33lemburgsetmessages: + msg104883
2010-05-03 20:45:19loewissetmessages: + msg104880
2010-05-03 20:36:22vstinnersetmessages: + msg104879
2010-05-03 20:30:19loewissetmessages: + msg104878
2010-05-03 20:27:26lemburgsetmessages: + msg104877
2010-05-03 19:33:36loewissetmessages: + msg104871
2010-05-03 14:25:43vstinnersetmessages: + msg104839
2010-05-03 13:55:12lemburgsetmessages: + msg104838
2010-05-03 09:10:20vstinnersetmessages: + msg104827
2010-05-03 08:56:24vstinnercreate