This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author sjt
Recipients Arfrever, ezio.melotti, lemburg, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, sjt, vstinner
Date 2015-05-09.07:53:04
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1431157984.91.0.636458618195.issue18814@psf.upfronthosting.co.za>
In-reply-to
Content
Please do not add the "rehandle" functions to codecs.  They do not change the (duck-typed) representation of data while maintaining the semantics, they change the semantics of data while retaining the representation.

I suggest a "validation" submodule of the unicodedata package, or perhaps a new "unicodeutils" package, for these functions, as well as those that just detect the surrogates, etc.

Because they change the semantics of data they should be documented as potentially dangerous because they can't be inverted back to bytes without knowledge of the history of transformations they perform (and not even then in the case of the "replace" error handler).  This matters in applications where the input bytes may have been digitally signed, for example.
History
Date User Action Args
2015-05-09 07:53:04sjtsetrecipients: + sjt, lemburg, ncoghlan, pitrou, vstinner, ezio.melotti, Arfrever, r.david.murray, serhiy.storchaka
2015-05-09 07:53:04sjtsetmessageid: <1431157984.91.0.636458618195.issue18814@psf.upfronthosting.co.za>
2015-05-09 07:53:04sjtlinkissue18814 messages
2015-05-09 07:53:04sjtcreate