This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ncoghlan
Recipients Arfrever, ezio.melotti, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, vstinner
Date 2014-08-24.13:00:36
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1408885236.48.0.0800432476105.issue18814@psf.upfronthosting.co.za>
In-reply-to
Content
The redecode thing is a distraction from my core concern here, so I've split that out to issue #22264, a separate RFE for a "wsgiref.fix_encoding" function.

For this issue, my main concern is the function to *clean* a string of escaped binary data, so it can be displayed easily, or otherwise purged of the escaped characters. Preserving the data by default is good, but you have to know a *lot* about how Python 3 works in order to be able figure out how to clean it out.

For that, not knowing Unicode in general isn't the problem: it's not knowing PEP 383. If we forget the idea of exposing the constant with the escaped values (I agree that's not very useful), it suggests "codecs.clean_surrogate_escapes" as a possible name:


    # Helper to ensure a string contains no escaped surrogates
    # This allows it to be safely encoded without surrogateescape
    _extended_ascii = bytes(range(128, 256))
    _escaped_surrogates = _extended_ascii.decode('ascii',
                                    errors='surrogateescape')
    _match_escaped = re.compile('[{}]'.format(_escaped_surrogates))
    def clean_surrogate_escapes(s, repl='\ufffd'):
        return _match_escaped.sub(repl, s)

A more efficient implementation in C would also be fine, this is just an easy way to define the exact semantics.

(I also just noticed that unlike other error handlers, surrogateespace and surrogatepass do not have corresponding codecs.surrogateescape_errors and codecs.surrogatepass_errors functions)
History
Date User Action Args
2014-08-24 13:00:36ncoghlansetrecipients: + ncoghlan, pitrou, vstinner, ezio.melotti, Arfrever, r.david.murray, serhiy.storchaka
2014-08-24 13:00:36ncoghlansetmessageid: <1408885236.48.0.0800432476105.issue18814@psf.upfronthosting.co.za>
2014-08-24 13:00:36ncoghlanlinkissue18814 messages
2014-08-24 13:00:36ncoghlancreate