Message225814
The WSGI 1.1 standard mandates that binary data be decoded as latin-1 text: http://www.python.org/dev/peps/pep-3333/#unicode-issues
This means that many WSGI headers will in fact contain *improperly encoded data*. Developers working directly with WSGI (rather than using a WSGI framework like Django, Flask or Pyramid) need to convert those strings back to bytes and decode them properly before passing them on to user applications.
I suggest adding a simple "fix_encoding" function to wsgiref that covers this:
def fix_encoding(data, encoding, errors="surrogateescape"):
return data.encode("latin-1").decode(encoding, errors)
The primary intended benefit is to WSGI related code more self-documenting. Compare the proposal with the status quo:
data = wsgiref.fix_encoding(data, "utf-8")
data = data.encode("latin-1").decode("utf-8", "surrogateescape")
The proposal hides the mechanical details of what is going on in order to emphasise *why* the change is needed, and provides you with a name to go look up if you want to learn more.
The latter just looks nonsensical unless you're already familiar with this particular corner of the WSGI specification. |
|
Date |
User |
Action |
Args |
2014-08-24 12:45:41 | ncoghlan | set | recipients:
+ ncoghlan |
2014-08-24 12:45:41 | ncoghlan | set | messageid: <1408884341.89.0.273491669506.issue22264@psf.upfronthosting.co.za> |
2014-08-24 12:45:41 | ncoghlan | link | issue22264 messages |
2014-08-24 12:45:41 | ncoghlan | create | |
|