New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fixups for encoding problems to wsgiref #54364
Comments
Currently wsgiref's CGIHandler makes a WSGI environ from the CGI environ without changes. Unfortunately the CGI environ is wrong in a number of common circumstances:
Previously, it was not clear in PEP-333 what was supposed to happen with headers and encodings, especially under Python 3. PEP-3333 clears this up. These patches add fixups to wsgiref to try to generate the nearest to a 'correct' environ as per PEP-3333 as possible for the current platform and server software. They also fix simple_server to use the correct encoding for PATH_INFO, and include the fix for bpo-9022, correspondingly updating the simple_server demo app and tests to conform to PEP-3333's expectation that headers will be ISO-8859-1-decoded Unicode strings. The test_bytes_validation test is removed: as I understand it, it's no long allowed to use byte string headers/status. |
(patch for Python 2.x, for what it's worth) |
(same again for branch PJ Eby's wsgiref svn: same as previous 2.7 patch aside from the line numbers) |
Your patch adds a new handler, which is arguably a new feature that has to be rejected in a bugfix branch. |
Ah, sorry, submitted wrong patch against 3.2, disregard. Here's the 'proper' version (the functionality isn't changed, just the former patch had an unused and-Falsed out clause for reading environb, which in the end I decided not to use as the surrogateescape approach already covers it just as well for values). @Éric: yes. Actually the whole patch is pretty much new functionality, which should not be considered for a 2.7.x bugfix release. I've submitted a patch against 2.7 for completeness and for the use of a separately-maintained post-2.7 wsgiref, but unless there is ever a Python 2.8 it should never hit stdlib. The status quo wrt Unicode in environ is broken and inconsistent, which an accepted PEP-3333 would finally clear up. But there may be webapps deployed that rely on their particular server's current inconsistent environ, and those shouldn't be broken by a bugfix 2.7 or 3.1 release. |
Committed to Py3K in r86146, with added docs and a larger list of transcodable CGI variables. |
Thanks. Some of those additions in _needs_transcode are potentially controversial, though. I'm not wholly sure it's the right thing to transcode these. Some of them may not actually come from the request, eg The case with the REDIRECT_HTTP_ and SSL_ envvars is an interesting one. Whilst transcoding them at some point will very probably be what applications need to do if they want to actually use them, is it within CGIHandler's remit to change Apache mod-specific variables that are not specified by CGI or WSGI? (There might, after all, be lots of these to catch for other mods and servers, and it's *conceivable* that somebody might be re-using one of these names to set in the environment for some other purpose, in which case transcoding would be adding an unexpected mangling. We can't in the general case expect users to know to avoid envvar names are used as non-standard extensions in all servers.) REDIRECT_HTTP_ at least comes from the HTTP request, so I guess the consistency is good there. (But then I think the only header that actually may contain non-ASCII is REDIRECT_URL, which replaces the unescaped SCRIPT_NAME and PATH_INFO; that one isn't caught at the moment.) |
So, do you have any suggestions for a specific change to the patch? |
No, not specifically. My patch is conservative about what variables it recodes, yours more liberal, but it's difficult to say which is the better approach, or what PEP-3333 requires. If you're happy with the current patch, go ahead, let's have it for 3.2; I don't foresee significant problems with it. It's unlikely anyone is going to be re-using the SSL_ or REDIRECT_ variable names for something other than what Apache uses them for. There might be some confusion from IIS users over what encoding REMOTE_USER should be in, but I can't see any consistent resolution for that issue, and we'll certainly be in a better position than we are now. |
(belated close-fixed) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: