classification
Title: wsgiref.simple_server breaks unicode in URIs
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Add advice about non-ASCII wsgiref PATH_INFO
View: 16679
Assigned To: Nosy List: SilentGhost, animus, grahamd, martin.panter, orsenthil, Александр Эри
Priority: normal Keywords:

Created on 2016-04-20 11:07 by animus, last changed 2016-04-21 04:42 by martin.panter. This issue is now closed.

Files
File name Uploaded Description Edit
t.py animus, 2016-04-20 11:10
Screenshot from 2016-04-20 14-26-03.png animus, 2016-04-20 11:27
Screenshot from 2016-04-20 14-28-03.png animus, 2016-04-20 11:28
Messages (11)
msg263819 - (view) Author: Alexey Gorshkov (animus) Date: 2016-04-20 11:07
example code is in attachment

example URI is (for example): http://127.0.0.1:8005/тест
msg263820 - (view) Author: Александр Эри (Александр Эри) Date: 2016-04-20 11:17
look also #issue26717
msg263821 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2016-04-20 11:21
What do you mean by "breaks"? Also, why do you encode your string as utf-8?
msg263822 - (view) Author: Alexey Gorshkov (animus) Date: 2016-04-20 11:27
take a look at 'pi:' result, please. - attaching screenshot
msg263823 - (view) Author: Alexey Gorshkov (animus) Date: 2016-04-20 11:28
also attaching same print output in console
msg263827 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-20 12:44
I think this is already covered in Issue 16679. PEP 3333 says it’s meant to work this way.

I admit it is very quirky. See also Issue 22264 discussing future enhancements.
msg263830 - (view) Author: Александр Эри (Александр Эри) Date: 2016-04-20 13:07
My browser encodes url in utf-8. To resolve this bug we need to look in web standards, not in pep.
msg263870 - (view) Author: Graham Dumpleton (grahamd) Date: 2016-04-21 03:39
Your code should be written as:

    res = """\
e:
{}
pi:
{}
qs:
{}
""".format(
        pprint.pformat(e),
        urllib.parse.unquote(e['PATH_INFO'].encode('Latin-1').decode('UTF-8')),
        urllib.parse.parse_qs(urllib.parse.unquote(e['QUERY_STRING'].encode('Latin-1').decode('UTF-8')))
        )
msg263871 - (view) Author: Graham Dumpleton (grahamd) Date: 2016-04-21 03:48
There does appear to be something wrong with wsgiref, because with that rewritten code you should for:

curl http://127.0.0.1:8000/тест

get:

pi:
/тест
qs:
{}

and for:

curl http://127.0.0.1:8000/?a=тест

get:

pi:
/
qs:
{'a': ['тест']}

The PATH_INFO case appears to fail though and outputs:

pi:
/тест
qs:
{}

Don't think I have missed anything.
msg263873 - (view) Author: Graham Dumpleton (grahamd) Date: 2016-04-21 04:09
This gets even weirder.

Gunicorn behaves same as wsgiref.

However, it turns out they both only show the unexpected result if using curl. If you use safari they are both fine.

Waitress blows up altogether on it with an exception when you use curl as client, but is okay with Safari and gives what I expect.

My mod_wsgi package gives what I expect whether you use curl or Safari. So Apache may be doing some magic in there to allow it to always work. No idea. But obviously mod_wsgi rules as it works regardless. :-)

uWSGI doesn't want to compile on MacOS X for me at the moment.

That Apache works properly whether use curl or Safari and other WSGI servers don't suggests something is amiss.
msg263877 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-21 04:42
Graham: On my Linux computer, Curl seems to treat the test “URL” as a string of bytes and doesn’t percent encode it. Therefore you may be affected by Issue 26717 which I fixed the other day. But in real life, URLs are meant to only have literal ASCII characters (even if they encode other characters), so this shouldn’t be a big problem. Compare IRI vs URI. Browsers tend to percent-encode using UTF-8.
History
Date User Action Args
2016-04-21 04:42:43martin.pantersetmessages: + msg263877
2016-04-21 04:09:13grahamdsetmessages: + msg263873
2016-04-21 03:48:05grahamdsetmessages: + msg263871
2016-04-21 03:39:39grahamdsetnosy: + grahamd
messages: + msg263870
2016-04-20 13:07:12Александр Эриsetmessages: + msg263830
2016-04-20 12:44:42martin.pantersetstatus: open -> closed

nosy: + martin.panter
messages: + msg263827

superseder: Add advice about non-ASCII wsgiref PATH_INFO
resolution: duplicate
2016-04-20 11:28:31animussetfiles: + Screenshot from 2016-04-20 14-28-03.png

messages: + msg263823
2016-04-20 11:27:29animussetfiles: + Screenshot from 2016-04-20 14-26-03.png

messages: + msg263822
2016-04-20 11:21:53SilentGhostsetnosy: + SilentGhost, orsenthil
messages: + msg263821
components: + Library (Lib), - Extension Modules
2016-04-20 11:17:43Александр Эриsetnosy: + Александр Эри
messages: + msg263820
2016-04-20 11:10:40animussetfiles: - t.py
2016-04-20 11:10:30animussetfiles: + t.py
2016-04-20 11:07:30animuscreate