This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ncoghlan
Recipients ncoghlan
Date 2014-08-24.12:45:41
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1408884341.89.0.273491669506.issue22264@psf.upfronthosting.co.za>
In-reply-to
Content
The WSGI 1.1 standard mandates that binary data be decoded as latin-1 text: http://www.python.org/dev/peps/pep-3333/#unicode-issues

This means that many WSGI headers will in fact contain *improperly encoded data*. Developers working directly with WSGI (rather than using a WSGI framework like Django, Flask or Pyramid) need to convert those strings back to bytes and decode them properly before passing them on to user applications.

I suggest adding a simple "fix_encoding" function to wsgiref that covers this:

    def fix_encoding(data, encoding, errors="surrogateescape"):
        return data.encode("latin-1").decode(encoding, errors)

The primary intended benefit is to WSGI related code more self-documenting. Compare the proposal with the status quo:

    data = wsgiref.fix_encoding(data, "utf-8")
    data = data.encode("latin-1").decode("utf-8", "surrogateescape")

The proposal hides the mechanical details of what is going on in order to emphasise *why* the change is needed, and provides you with a name to go look up if you want to learn more.

The latter just looks nonsensical unless you're already familiar with this particular corner of the WSGI specification.
History
Date User Action Args
2014-08-24 12:45:41ncoghlansetrecipients: + ncoghlan
2014-08-24 12:45:41ncoghlansetmessageid: <1408884341.89.0.273491669506.issue22264@psf.upfronthosting.co.za>
2014-08-24 12:45:41ncoghlanlinkissue22264 messages
2014-08-24 12:45:41ncoghlancreate