This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ncoghlan
Recipients ncoghlan
Date 2013-08-23.04:02:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1377230552.02.0.907422956718.issue18814@psf.upfronthosting.co.za>
In-reply-to
Content
Prompted by issue 18713 and http://lucumr.pocoo.org/2013/7/2/the-updated-guide-to-unicode/, here are some possible utilities we could add to the codecs module to help deal with/debug issues related to surrogate escaped strings:

    def has_escaped_bytes(s):
        """Returns true if string contains surrogate escaped bytes"""
        ...

    def replace_escaped_bytes(s):
        """Replaces each surrogate escaped byte with a valid code point"""
        ...

    def decode_escaped_bytes(s, nominal_encoding, actual_encoding):
        """Reinterprets incorrectly decoded text using a new encoding"""
        return s.encode(nominal_encoding, 'surrogateescape').decode(actual_encoding)
History
Date User Action Args
2013-08-23 04:02:32ncoghlansetrecipients: + ncoghlan
2013-08-23 04:02:32ncoghlansetmessageid: <1377230552.02.0.907422956718.issue18814@psf.upfronthosting.co.za>
2013-08-23 04:02:31ncoghlanlinkissue18814 messages
2013-08-23 04:02:31ncoghlancreate