New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a bytes version of os.environ and getenvb() #52849
Comments
As discussed in issue bpo-8514, I propose a bytes version of os.envionb which would be synchronized with os.environ (which is possible thanks to surrogateescape error handler). I also propose a os.getenvb()->bytes function. I don't know yet if it's a good idea of not, but my patch accepts both bytes and str for os.environ(b).get() and os.getenv(b)(). antoine> In posixmodule.c, (...) if memory allocation of the bytes I would require to change also the Windows version and the code specific to OS/2. Ok to do that, but after closing this issue ;-) I don't want to change to much things at the same time. |
The patch creates also fsencode()/fsdecode() functions proposed in bpo-8514: I can rename them to use protected name (eg. "_encodeenv" and "_decodeenv") until we decided for these functions. |
A view comments on the patch: + def __init__(self, data, encodekey, decodekey, encodevalue, decodevalue, putenv, unsetenv): As general guideline: When adding new parameter, please add them to the Doesn't matter much in this case, since _Environ is only used internally, -- +data = {} Please put such init code into a function or make sure that the global -- + # bytes environ This looks wrong even though it will work, but that's only a -- + def fsencode(value): The function should not accept bytes as input or at least -- The patch is also missing code which keeps the two dictionaries in |
No, it doesn't :-) os.environ and os.environb are synchronized and there is a test for this! ;-) I will see later for your other comments. |
Can somebody please explain what problem is being solved with this patch? |
Martin v. Löwis wrote:
The way os.environ is currently set up on Unix breaks applications Please see the discussion on http://bugs.python.org/issue8514 |
I can't see any report of actual breakage in that report, only claims of |
An issue was opened 2 years ago: "It was brought up in a discussion of sending non-ASCii data to a CGI-WSGI script where the data would be transferred via os.environ." => bpo-4006 (closed as "wont fix"). |
Fortunately, that issue could now be reconsidered as "fixed"; the |
Martin v. Löwis wrote:
Set your CODESET to ASCII and watch the surrogate escaping Here's one (RFC 3875, sections 4.1.7 and 4.1.5): LANG = 'en_US.utf8'
CONTENT_TYPE = 'application/x-www-form-urlencoded'
QUERY_STRING = 'type=example&name=Löwis'
PATH_INFO = '/home/löwis/bin/mycgi.py' (HTML uses Latin-1 as default encoding and so do many of the The file system encoding simply doesn't relate to the OS |
I still don't see a *failure* here. AFAICT, it all works correctly. |
BTW, I think you are misinterpreting the RFC. It doesn't actually say "the details of the parsing, reserved characters and support for non Latin-1 is only given as a possible example. Apache passes the URL from QUERY_STRING = 'type=example&name=L%F6wis'
or
QUERY_STRING = 'type=example&name=L%C3%B6wis' IMO, applications are much better off to consider QUERY_STRING as a |
Martin v. Löwis wrote:
Your name will end up being partially escaped as surrogate: 'L\udcf6wis' Further processing will fail, since the application would Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character '\udcf6' in position 1: ordinal not in
range(256)
The use of the 'surrogateescape' error handler modifies the This works fine as long as the data is only used *as reference* to It doesn't work if an application tries to work *with* the data, |
That depends on the further processing, no? > Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'latin-1' codec can't encode character '\udcf6' in position 1: ordinal not in
> range(256) Where did you get this error from?
Converting it to what?
Parsing will work fine.
It's a string. You shouldn't decode it.
And how would that not happen if it was bytes? The problems you describe |
The problem is that we don't have reliable algorithm to get the encoding of each environment variable. We cannot ensure that all variables are encoded "correctly". Using os.getenvb(), you can decode the string using the right encoding (which may be different for each variable). Marc proposed os.getenv(key, encoding=...) but I don't like this solution: you have to split correctly all unicode things and all bytes things. You should have the choice to keep environment unchanged, as byte strings, and manipulate only byte strings, or to use the classic API (unicode only, os.getenv, os.environ(), ....). os.environb and os.getenvb() will be required to applications in real world application supporting all applications and misconfigured environments. Python3 shouldn't try to fix misconfigured systems but leave this problem to the developer. |
Ok. If that's the motivation, the documentation should make that I also worry that people won't get it right any better than Python: |
Martin v. Löwis wrote:
Please read on: """ I could have also given you an example using 'multipart/form-data' These are not made up examples, they do occur in the real world for
Believe me, I've been working with HTML, forms, web apps, etc. And this is just one example of how the OS environment can Even if these all use UTF-8, a user might still want to stick PEP-383 is nice for file names and paths, but it's unfortunately |
Developers coming from Python2 will continue to use os.getenv() and will not worry about encoding, and maybe not notice that the result is now unicode (and not more a byte string). I think that a developer will only switch to os.getenvb() if he/she has troubles with the encodings. |
That's indeed a positive feature of this proposed change. |
Martin v. Löwis wrote:
>
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
>> Your name will end up being partially escaped as surrogate:
>>
>> 'L\udcf6wis'
>>
>> Further processing will fail
>
> That depends on the further processing, no?
>
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> UnicodeEncodeError: 'latin-1' codec can't encode character '\udcf6' in position 1: ordinal not in
>> range(256)
>
> Where did you get this error from? The roundup email interface must have eaten this
first line of the traceback: >>> _.encode('latin-1')
Martin, it's obvious that you are not even trying to understand |
New version of my patch, which looks much better. Summary: Issue bpo-8603: Create os.environb and os.getenvb() on POSIX system. Changes with my previous patch:
|
@loewis: So do you agree to add os.environb and os.getenvb()? The documentation of os.environb and os.getenvb() in my last patch is very short. I'm not inspired. We told me on IRC to not use function annotations because annotation semantic was not decided yet. I will try to improve the documentation and remove the annotations in my next patch. |
STINNER Victor wrote:
Patch looks good. +1 on adding it. One nit: I'd rename the keymap function to encodekey. |
MaL> Patch looks good. +1 on adding it. Cool. I didn't understood if MvL is +1, but at least he's not -1 on this, and MaL> One nit: I'd rename the keymap function to encodekey. Ok, I will also change that in the final patch. |
I agree with the patch (-2) in principle. I think the error handling py> os.getenvb('PATH')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/martin/work/3k/Lib/os.py", line 484, in getenvb
return environb.get(key, default)
File "/home/martin/work/3k/Lib/_abcoll.py", line 357, in get
return self[key]
File "/home/martin/work/3k/Lib/os.py", line 400, in __getitem__
value = self.data[self.encodekey(key)]
TypeError: string argument without an encoding which then leads to the natural, but incorrect py> os.getenvb('PATH', encoding='ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: getenvb() got an unexpected keyword argument 'encoding' The first error should remain TypeError, but say something like I notice an incompatible change: posix.environ has now a different There is a couple of white-space only changes in the patch; it would be |
"When two paths open to you, you should always choose the most difficult" (in french: "Quand deux chemins s'ouvrent à nous, il faut toujours choisir le plus difficile"). I fixed posixmodule.c instead of my patch :-) |
I don't understand, what is an "element type"? |
In a container, the contents is sometimes called "elements"; their |
Aaaah, *posix*.environ, not *os*.environ, ok. I will fix posix documentation. I didn't knew this dictionary :-) |
Patch version 3:
|
Oh no, I forgot to remove the annotations from getenv() and getenvb() in os.py. I only removed them from the documentation. |
Commited as r80885 (py3k), blocked in 3.1 (r80886). Thank you Martin and Marc for your great help and all your reviews ;-) |
FWIW os.environb is missing from os.__all__. |
A quick search0 also shows that environ.data is used by several projects. Changing it from str to bytes will most likely break these programs, so I'm not sure it's a good idea. |
Ezio Melotti wrote:
Direct use of os.environ.data is not permitted as it is not a documented The few uses you found are easy to fix. |
Le jeudi 29 juillet 2010 03:08:27, Ezio Melotti a écrit :
_Environ is a wrapper on data: call putenv() when a value is changed and It looks like a common usage of .data is to get get to "a real dict" (and not
I'm not sure of that. It looks (in the search result) that os.environ.data is Eg. zope2instance: old_env = os.environ.data.copy()
...
os.environ.data = old_env It does still work with bytes.
Yes, it's not a good idea to use .data :-) This attribute should be protected,
data is shared between os.environ and os.environb, because data is the
No, data should use the native type: bytes on UNIX and BSD, str on Windows. -- If you still consider that the change on .data as a bug, I think that the fix |
Le jeudi 29 juillet 2010 03:00:03, Ezio Melotti a écrit :
Fixed by r83237 |
r84690 marks os.environ.data as protected. Close this issue again. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: