This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ncoghlan
Recipients Arfrever, ezio.melotti, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, vstinner
Date 2014-08-24.08:24:10
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1408868650.37.0.873787399213.issue18814@psf.upfronthosting.co.za>
In-reply-to
Content
The purpose of these changes it to provide tools specifically for working with surrogate escaped data, not for working with arbitrary lone Unicode surrogates.

"escaped_surrogates" is not defined by the Unicode spec, it's defined by the behaviour of the surrogateescape error handler that lets us tunnel arbitrary bytes through str objects and reproduce them faithfully at the far end. On reflection, I think codecs would be a better home than string (as that's where the error handler is defined), but it doesn't belong in unicodedata.

I'd be OK with changing the name of the clean function to "clean_escaped_surrogates".

Needing redecode is not a bug: it's baked into the WSGI spec in PEP 3333. I would be OK with providing it in wsgiref rather than the codecs or string modules, but I think we should provide it somewhere.
History
Date User Action Args
2014-08-24 08:24:10ncoghlansetrecipients: + ncoghlan, pitrou, vstinner, ezio.melotti, Arfrever, r.david.murray, serhiy.storchaka
2014-08-24 08:24:10ncoghlansetmessageid: <1408868650.37.0.873787399213.issue18814@psf.upfronthosting.co.za>
2014-08-24 08:24:10ncoghlanlinkissue18814 messages
2014-08-24 08:24:10ncoghlancreate